Here is the breakdown of performance issues that I have. The ones I think will
lead to decent wins are starred, and Super Shader triple-star'd. This list was
pulled from the JIRA filter I previously sent. The point of this post is to
give everybody an easy-to-see list of performance related issues (as of a month
ago or so). Some of these might now be done, this isn't meant to be
comprehensive (although at the time I did this I did visit each and every issue
labeled "performance" so it was pretty comprehensive!).
Interested in helping out? I'll be glad to give background on any one of these
issues and pointers as to how to go about working on any of them.
Richard
Architecture
• *RT-9363*: Consider reducing conversions between 'FX' API and scene
graph API
• *RT-24582*: High frequency refresh and Heavy but low priority updates
in the same app (multithreaded render, multi instance…)
• *RT-26492*: Use GCC link time optimization to reduce binary size
• *RT-26531*: Provide independent stage performance
• RT-15083: Replace boolean fields with bit fields
• RT-20397: Remove PGNodes
• RT-23470: Replace java.lang.Math usage in places where precision is
not as important
• RT-23741: Add a hint to let scene graph and Prism know that we are
animating
• RT-23866: Optimize Raspberry PI build for armv6/VFP
• RT-23867: Mac Glass uses gcc -O3 which is known to produce code with
large static footprint
• RT-23868: Glass: Consider collapsing Event classes into a single one.
• RT-24238: Analyze property getters
• RT-29861: Consider replacing Math functions with a faster alternative
• RT-29900: Increased CPU usage on application iconified
Decora
• RT-2892: Improve performance of Gaussian-based effects
• RT-2908: Use scaled kernel to improve DropShadow performance for node
scale factors < 1
• RT-5347: Prism: finish drop/inner shadow optimizations
• RT-5420: DropShadow effects significantly affect performance
• RT-6935: ColorAdjust effect consumes a lot of memory which could lead
to OOM exception
• RT-8890: Merge and some Blend effects should optimize rendering to
destination
• RT-9225, RT-9226, RT-9227: Various effects don't limit the size of
the input image when requests are outside the clip
• RT-9432: Some of the hand-tuned software effect peers are not
optimized for use with transformed inputs
• RT-9433: The auto-generated software peers for the effects filters do
not handle transformed inputs optimally
• RT-9434: Reflection effect does not clip its output image to the
requested clip bounds
• RT-9437: Prism and Hardware Swing pipelines could perform
PerspectiveTransform directly
• RT-13714: Implement ColorAdjust as a matrix multiplication
Text
• *RT-23467*: Evaluate Native Text Engines
• *RT-23578*: Consider pre-populating the glyph cache with data for the
default font at the default size(s)
• *RT-23705*: Reduce the amount of glyph data copied via Java from
native to see if it helps performance
• *RT-23708*: Investigate if a segmented glyph cache can help
performance
• *RT-30158*: Investigate String Measurement in FX (cache results, call
less, …)
• RT-5069: Text node computes complete text layout, even if clipped to
a much smaller size
• RT-6475: Need new hints to control how Text node is rendered
• RT-21269: Font#loadFont(String,double) downloads file in the main
thread
• RT-23579: Consider using a fixed interval for glyph cache for faster
computation
• RT-23580: Add a variant of text smoothing to deal with rotated text
at higher versus lower quality
• RT-24329: LCD font smoothing performance
• RT-24565: Beagle: Complex Text implementation generates big swing in
frame rate
• RT-24941: 8.0-graphics-scrum-h90: GlyphCache.render() takes up to
200ms which results in jerky rendering
• RT-26111: Use glyph bounding boxes to get visual bounds
• RT-26894: String rendering is less performant than java2D one
Scene Graph
• *RT-23346*: Provide API access to multiple hardware screen layers
• RT-5477: Improve performance and reduce garbage when animating
gradients
• RT-5525: Group will get bounds change notification when child's
bounds change, even if change in child didn't alter bounds of Group
• RT-9390: Improve picking performance using Dmitri's algorithm (or
other)
• RT-9571: Consider adding image caching for images loaded from remote
URLs
• RT-10604: Recomputing bounds when effects are used even if not dirty
• RT-10681: Reevaluate only changed KeyFrames
• RT-12105: Fix for RT-11562 disables an optimization for calculating
content bounds
• RT-12136: SortedList possible optimizations
• RT-12137: FilteredList possible optimizations
• RT-12564: Layout spends considerable time in getManagedChildren
• RT-12715: Node.toBack()/toFront() are inefficient
• RT-13593: Performance of PathTransition sucks
• RT-19221: Padding for round cap could be optimized in Line
• RT-19222: Optimize impl_configShape of Path
• RT-20455: Do not always recreate the whole geometry in calls to
impl_configShape
• RT-23312: OutOfMemoryError after pressing Ctrl+Alt+Del or minimizing
the window whilst animating a canvas
• RT-24587: Changing a single child of FlowLayout is slower than
changing all children
• RT-26007: Mouse event post-processing does unnecessary work, may be
incorrect altogether
• RT-29717: Do not wrap notifications in ObservableList wrappers when
no listeners are set
Prism
• *RT-15118*: Need to consider architectural changes for doing
transforms in prism
• *RT-15839*: Complex animated content in a ScrollPane is jerky
although little is seen
• *RT-17396*: Shader based 2D path rendering
• *RT-17582*: Render the scene using retained data structures
• *RT-20356*: PresentingPainter and UploadingPainter disregarding dirty
clip rect
• *RT-20405*: Improve Path rendering performance
• *RT-23371*: FB: Render windows on separate hardware layers
• *RT-23450*: Improve performance of Prism rendering and clipping
• *RT-23462*: Create "CommandBuffer" for storing graphics drawing
commands in Prism
• *RT-24168*: View.uploadPixels could take a source rectangle to upload
only a portion of the pixels
• *RT-30271*: No culling if the only dirty region contains the clip
• *RT-30361*: Consider rendering directly to frame buffer instead of RTT
• *RT-30440*: Eliminate redundant OpenGL calls
• ***RT-30741***: Super Shader
• *RT-30746*: don't fill transparent rectangles, cache a more textures
to avoid buffer flush
• *RT-30748*: Use Vertex Shader to provide clipping instead of Scissor
test
• RT-5835: Fix for RT-5788 disabled an optimization for anti-aliased
rectangles
• RT-6968: Prism should support 2-byte gray-alpha .png format
• RT-8722: Strokes and fills of Paths slower than flash
• RT-9682: Optimize shadow effects for rounded rectangles
• RT-10369: Optimize blurs in shaders
• RT-12400: Delays in D3D Present affect performance
• RT-14058: Consider possibility to eliminate using of
BasicStroke.tmpMiter
• RT-14216: MultipleArrayGradient uses a lot of memory
• RT-14358: Insertion sort in OpenPisces ScanlineIterator may be very
inefficient
• RT-14421: Branch YCbCr shader may reduce performance on slower
hardware
• RT-15516: image data associated with cached nodes that are removed
from a scene are not aggressively released
• RT-17507: Optimize non-uniform round rect rendering in Regions
• RT-17510: Improve performance of rendering a TRANSPARENT stage on
Windows 7
• RT-17551: MacOS: Optimize using lockFocusIfCanDraw
• RT-18060: Evaluate whether enabling multithreaded GL engine on Mac
benefits Mac JFX performance
• RT-18140: Consider using nearest-neighbor when smooth=false for SW
pipeline to improve performance
• RT-18417: Investigate Mac runtime code for possible native code
optimizations using GDC (Grand Dispatch Central)
• RT-19556: Consider removing usage of DirectByteBuffer and
ByteBuffer.allocateDirect
• RT-19576: Pixel readback performance for the ES2 pipeline has room
for improvement
• RT-21025: iOS: DirtyAreaTest on iOS is slower than we like
• RT-22430: Use 'fillQuad' vs. 'fillRect' for pixel aligned rectangular
regions
• RT-22431: Optimize Charts drawing to use filled quads
• RT-23464: Reduce Vertex Buffer Overhead: Constant Color Attribute vs.
Array Color Attributes
• RT-23465: Using TriangleStrip instead of Triangles
• RT-23466: Improve Vertex Buffer Usage: Structure of Arrays vs. Array
of Structures
• RT-23471: Add new Etched effect
• RT-23574: Add support for tiled rendering of textures (both for
performance and functional reasons)
• RT-23575: Need a more compact representation for text data
• RT-23576: Ability to add hand-coded shaders (bypassing JSL)
• RT-23577: Support for geometry shaders on graphics chips that support
it
• RT-23581: Add ability to render 9-slice directly in Prism graphics
• RT-23725: Beagleboard: Execute fragment shader on the GPU causes
significant drop in performance
• RT-23742: Gradient is slow on embedded systems
• RT-24104: Native Pisces rasterizer is slower on desktop Linux
platforms
• RT-24339: Add a short-cut to dirty region code based on parent /
child bounds ratio
• RT-24557: ImagePattern is slow on embedded systems
• RT-24624: prism-sw pipeline is up to 90% worse than j2d pipeline
• RT-25166: Path updates in a ScrollPane where content has a Scale
transform are 100 times slower
• RT-25603: Mac optimization: Investigate layers async vs sync setting
• RT-25694: Rewrite (AA)TessShapeRep classes in order to avoid
unnecessary translations
• RT-25864: New "shared textures" do not share pixel update flags as
well as they should
• RT-26531: Provide independent stage performance
• RT-28222: Don't render transparent rectangles
• RT-28305: NGRegion optimizations based on Color.TRANSPARENT are
ineffective
• RT-28670: Create a roundrect renderer that uses the new "texture
primitive" based shaders used currently for ellipses and rects
• RT-28752: Mac: 8.0-graphics-scrum-792: up to 30% performance
regression on MacOS
• RT-29542: FX 8 3D: Mesh computation code needs major clean up or redo
• RT-30360: Create fewer temporary objects in Quantum
• RT-30589: preprocess remove comments from ES2 3D shaders
• RT-30710: 8.0-graphics-scrum-1194: 20% performance regression in
Bitmap benchmarks in SW pipeline
• RT-30745: Remove Flush & Finish in ES2SwapChain
• RT-30747: Introduce a low cost clipping API for simple rectangle
based clipping
Media
• RT-11379: video playing with MediaPlayer slows down refreshes to
Java2D component
• RT-16420: MediaPlayer/View loses frames from video streams encoded at
25,30,60 fps
• RT-17861: Use shaders to assist video decoding on the GPU
• RT-20890: Too many open files and Memory leak
Web
• RT-24320: WebView draws entire back buffer on screen upon every
repaint
• RT-24998: Please enable Javascript JIT for 64 bit
• RT-16848: Optimize Unicode implementation
• RT-18909: Extend support for composite operations in Prism Graphics
• RT-19625: Better support for Webnode to improve rendering performance
• RT-20501: Prism needs to provide proper APIs to support the Webnode
team to improve webnode rendering performance
• RT-21629: Slow and never-ending rendering of page
• RT-21722: html5 video inside is slow
• RT-22008: Zero size WCGraphicsPrismContext.Layer handling is not
perfectly efficient
• RT-30083: netflix.com: vertical scrollbar is tremendously slow
Threading
• *RT-2893*: Enable multi-threaded processing of software-based effects
when >= 2 cores available
• *RT-26702*: Poor DisplacementMap effect performance on Mac
Interop
• RT-22133: Performance: JavaFX Webview
QuantumRenderer$PipelineRunnable.run() and WinApplication._runLoop() take up
more than half the time in a JDeveloper operation
• RT-22567: Minor tweaks to FX/Swing painting
• RT-22705: Simple animation runs at lower FPS when embedded into
JFXPanel
• RT-24278: JFXPanel with simple animation consumes entire CPU core
• RT-26993: Noticeable jerkiness when running JFXPanelBitmapBenchmark
on MacOS
Benchmarks
• RT-7644: Math.floor and Math.ceil take up a lot of cpu time
Controls
• *RT-24105*: TabPane renders content of all tabs even only one is
active
• *RT-30452*: Setting clip on TableCellSkinBase is incorrect
• *RT-30552*: Label: resolve LabelSkinBase's use of clips for text
• *RT-30568*: Reduce unnecessary calls to setManaged(true) in Controls
• *RT-30576*: Parent: add new public layout method, optimized to only
layout this parent and it's children
• *RT-30648*: Investigate API for TabPane's Tab Content Loading policy
• RT-9094: VirtualFlow requests data from model too frequently
• RT-10034: Performance optimizations around SelectionModel
implementations
• RT-13792: Investigate caching in controls (NOTE: Unlikely to be any
win)
• RT-16529: Memory Leak: event handlers of root TreeItem are not removed
• RT-16853: TextArea: performance issue
• RT-18934: TextArea.appendText/deleteText may be very slow
• RT-20101: [ComboBox] Custom string converter is applied too many times
• RT-23825: Controls need a lifecycle API
• RT-24102: CSS Loading: Split caspian.css into multiple smaller
component parts.
• RT-25652: Memory Leak in TabPane
• RT-25801: 8.0-controls-scrum-h81: 25% performance regression in
Controls.RadioButton on mac-low end machine
• RT-26716: Performance of scrolling TreeView tail is much more slowly
when scrolling TreeView head
• RT-26999: 8.0-controls-scrum-h122: up to 20% regression in some
Controls.TableView benchmarks
• RT-27725: 8.0-controls-scrum-h186: 22% footprint increase in
ChoiceBox control
• RT-27986: Spinning progress indicator overlapping an image plays
havoc with RDP
• RT-29055: java.lang.OutOfMemoryError: Java heap space error in
switching between caspian to modena theme in Modena App
• RT-30305: 8.0-controls-scrum-569: 42% performance regression in
Controls.ListView-Keyboard
• RT-30713: VirtualFlow creates new cells in some instances
• RT-30824: TableView TableCell memory issue in javaFX 8.x
Embedded
• *RT-30721*: Provide flag to turn on PRESERVED mode in EGL
• *RT-30722*: Provide an option for 16-bit opaque frame buffer on the
Raspberry PI
• *RT-30723*: EGL: Disable clipping when clearing frame buffer
• RT-24685: Virtual keyboard initialization is slow
• RT-24937: Use a C/C++ compiler that can take advantage of NEON
• RT-25943: Need to consider specific OpenGL extension on embedded
system
• RT-25995: Prism porting layer function to query platform VRAM
• RT-27590: Evaluate effect of ProGuard on runtime size
• RT-28012: EGLFB: RAM allocation should be reduced
• RT-28029: Improve EGLFB dialog / popup response time
• RT-30719: Enabled video underlays on Raspberry PI
CSS
• *RT-28966*: CSS creates new objects for complex values which trigger
redundant processing including rendering
• *RT-30381*: fx8.0-b86: CSS code for modena css rules with multiple
selectors is not optimized
• RT-11506: Short circuit CSS if CSS is not relevant to the Node
• RT-11881: Some css selectors in caspian.css will turn the CSS
processing on for all the parents
• RT-11882: Under current conditions, every Node is processing CSS
• RT-23468: Remove use of List in CSS internals in favor of arrays
• RT-30817: lazy deserialization of css declarations
• RT-30818: CSS: Avoid creating ObservableList for declarations and
selectors in Rule
FXML
• *RT-23527*: Compile FXML to .class file
Tooling
• RT-13312: Develop GLBenchmark to get baseline performance on any
particular hardware
• RT-13313: Performance framework (GPU usage)
• RT-18326: Implement performance counters (prism.printStats) feature
for prism-es2 pipe
• RT-26560: Option to track texture memory allocation
• RT-30651: 8.0-graphics-scrum-1216: full speed mode seems to be broken
Startup
• RT-14930: JNLP-start consumes large amount of time
• RT-20159: Startup regression in controls scrum #371
On Jul 3, 2013, at 9:56 AM, Richard Bair <[email protected]> wrote:
>> Obviously there's a lot going on with the move to gradle, but we are a few
>> lines of Gradle build code away from JFX on iOS. I'm keen to find out just
>> how well it will run.
>
> In the runs I've seen (not on RoboVM) the main bottleneck is in graphics
> rendering. We don't know specifically why yet, but we have a lot of ideas.
> Now that Tobi reports FX + RoboVM (including fonts!) is working, I'm eager to
> see the performance characteristics as well.
>
> With the work you've done on the developer workflow and now that we've got an
> open build running on the device, we are going to need to get organized
> around measuring, reporting, and fixing performance issues encountered on the
> device. Likely some of it will be RoboVM related, but there is plenty of
> optimization to do in Prism as well.
>
> We've learned a lot about embedded hardware over the last year or so. Some of
> the things we've learned:
> - It is almost *always* fill rate limited
> - Pixel shader complexity costs you
> - CPU -> GPU bandwidth is very limited
>
> Solving the fill rate issue is huge. The Android team reckons that you can
> overwrite the same pixel maybe 2x before you start noticeably losing
> performance, 3x or more and you're dead. It doesn't even matter what it is
> you are doing per-pixel (could be simply filling each pixel with a solid
> color). The fact that you are running a pixel shader for 3x or 4x the number
> of pixels taxes the hardware.
>
> So for example, right now I believe we are doing 3x overdraw before we even
> do anything. I think first we do a clear, then we fill with black, then we
> fill with the Scene fill color. Then we draw whatever you give us. Obviously
> this is not optimal!
>
> For pixel shader complexity -- you can probably get away with more complex
> pixel shaders if they are only running 1x per pixel, but when they are
> running 3x or 4x per pixel then the complexity of the pixel shaders burns
> you. We did a lot of optimizations here already so hopefully we've got this
> one in good shape. But just something to be aware of.
>
> The CPU -> GPU bandwidth problem is one that is systemic with all these
> mobile devices. Higher bus speeds == less battery life, so the devices are
> designed with low bus speeds and this makes transfer of data between CPU and
> GPU costly. Games will typically do all the transfer once up front (all the
> graphics assets for a level are loaded up front) and then during the game
> they are just adjusting the viewport & vertices (often in vertex shaders so
> as not to pass much data down to the card), etc. Right now we are doing a
> tremendous amount of communication with the GPU. Ironing this out is the
> basis for the "super shader" (https://javafx-jira.kenai.com/browse/RT-30741).
>
> I would recommend anybody interested in performance keep the "Open
> Performance Issues" filter on their JIRA dashboard. There is a link to 221
> performance issues (most of which are ideas about things to do to improve
> performance). We also need to close the loop on the other issues we were
> discussing about jerkiness a couple weeks ago.
>
> Richard