Hi Christopher, Thanks. I had to adapt your code to reuse listeners because I ran out of memory. So far I haven't found anything, but will keep looking. At least it's possible to give a version with this to select users which isn't possible with the conditional breakpoint approach.
Dean On Wed, Jun 25, 2025 at 2:37 PM Christopher Schnick <crschn...@xpipe.io> wrote: > (Resending this mail since it somehow didn't make it to the mailing list > the first time) > > This issue vanished for us after reworking the application, including > implementing more fixes for non-platform thread access. I can't say > definitively whether this caused it, but there were some rare instances for > us where some properties were changed from the wrong thread. > > As an easy solution to check the platform thread access for everything > without having to implement explicit asserts everywhere, was a > listener-based approach I implemented here: > https://mail.openjdk.org/pipermail/openjfx-dev/2025-April/053212.html . > > It might still be possible that you can encounter this issue just using > the platform thread as you explained. A better error handling of this > situation in JavaFX would make this issue already less severe. A proper fix > to prevent this from happening would be even better, but I have no idea how > feasible this is. > > > On 25/06/2025 11:06, Dean Wookey wrote: > > > Hi Everyone, > > We've also been experiencing this problem over the years. It seems to > be related to JDK-8198577. > > Once it goes wrong, each pulse hits the issue repeated meaning it can > never escape. It's rare, but extremely disruptive when it does occur > because the user loses what they've been working on and has to restart the > app. > > I've tried really hard to figure out the conditions this happens in. I > don't think it's a multiple thread issue (although for some people it > almost certainly could be triggered that way) because we've put conditional > breakpoints that trigger whenever anything that could affect dirty children > is done off the app thread. We've got assert > Platform.isFXApplicationThread() all over our app to make sure the > threading is happening properly. > > What I think is happening is that getChildTransformedBounds which is > being called inside the updateCachedBounds loop, can in some rare cases, > end up triggering a call to updateCachedBounds on the same node. Basically > updateCachedBounds can call itself recursively. This is a snipped from > Parent.java in updateCachedBounds. > > // this checks the newly added nodes first, so if dirtyNodes is the > // whole children list, we can end early > for (int i = dirtyNodes.size() - 1; remainingDirtyNodes > 0; --i) { > final Node node = dirtyNodes.get(i); > if (node.boundsChanged) { > // assert node.isVisible(); > node.boundsChanged = false; > --remainingDirtyNodes; > tmp = getChildTransformedBounds(node, > BaseTransform.IDENTITY_TRANSFORM, tmp); > > In the code above, if this gets called recursively through > getChildTransformedBounds, then node.boundsChanged will change to false for > all the nodes which stops remainingDirtyNodes from being updated and i > eventually goes negative. > > We tried to fix the scene graph when this happens by catching the > exception in the Thread.setDefaultUncaughtExceptionHandler but it didn't > work. Maybe Christopher's suggested fix would work, but as Kevin says "It > needs to be tested to ensure that when we get the AIOOBE that we can > recover. It wouldn't solve anything if we catch and log that exception only > to have it fail shortly after because the scene graph isn't in a good state > (I don't know whether that would be the case, but it's something that needs > to be checked)." > > Here's how we tried to fix the scene graph when we caught the error. > The "Fixing IOB Issue" log gets hit all the time, but it doesn't find any > problems, and in the next pulse it hits the problem again with various > different stack traces until it settles on one. In our latest example of > the error, it first occurred during a Platform.runLater and not during the > pulse, but then all subsequent issues happen during the pulse. > > protected static void checkSpecialException(Throwable t) { > if (t instanceof IndexOutOfBoundsException) { > fixIndexOutOfBounds(t); > } > } > > public static void fixIndexOutOfBounds(Throwable throwable) { > FXUtilities.log(EmbraceDesktop.class, > org.slf4j.event.Level.INFO, "Fixing IOB Issue"); > try { > Field dirtyChildrenCountField = > Parent.class.getDeclaredField("dirtyChildrenCount"); > dirtyChildrenCountField.setAccessible(true); > Field dirtyChildrenField = > Parent.class.getDeclaredField("dirtyChildren"); > dirtyChildrenField.setAccessible(true); > Set<Scene> apps = applicationManager.getApplications(); > ArrayList<Node> brokenStack = new ArrayList<>(); > for (Scene s: apps) { > fixTreeRecursive(dirtyChildrenCountField, > dirtyChildrenField, s.getRoot(), brokenStack); > } > if (brokenStack.size() > 0) { > StringBuilder errorStack = new StringBuilder(); > for (Node n: brokenStack) { > errorStack.append(n.getClass().getSimpleName() + " > " + String.join( ",", n.getStyleClass())).append("\n"); > } > EmbraceAnalytics.logCrash("Index out of bounds > crash",errorStack.toString(), throwable); > } > > > } > catch (Throwable t2) { > FXUtilities.log(EmbraceDesktop.class, > org.slf4j.event.Level.ERROR, "Exception while fixing tree", t2); > } > } > > protected static boolean fixTreeRecursive(Field > dirtyChildrenCountField, Field dirtyChildrenField, Parent parent, > ArrayList<Node> brokenStack) throws IllegalAccessException { > List<?> dirtyChildren = (List<?>) > dirtyChildrenField.get(parent); > int dirtyChildrenCount = (int) > dirtyChildrenCountField.get(parent); > if (dirtyChildren != null) { > if (dirtyChildrenCount > dirtyChildren.size()) { > FXUtilities.log(EmbraceDesktop.class, > org.slf4j.event.Level.ERROR, "Offending node1 was " + > parent.getClass().getSimpleName()); > dirtyChildrenCountField.set(parent, > dirtyChildren.size()); > brokenStack.add(parent); > return true; > } > } > else { > if (parent.getChildrenUnmodifiable().size() < > dirtyChildrenCount) { > FXUtilities.log(EmbraceDesktop.class, > org.slf4j.event.Level.ERROR, "Offending node2 was " + > parent.getClass().getSimpleName()); > dirtyChildrenCountField.set(parent, > parent.getChildrenUnmodifiable().size()); > brokenStack.add(parent); > return true; > } > } > for (Node n: parent.getChildrenUnmodifiable()) { > if (n instanceof Parent) { > boolean error = > fixTreeRecursive(dirtyChildrenCountField, dirtyChildrenField, (Parent)n, > brokenStack); > if (error) { > brokenStack.add(parent); > FXUtilities.log(EmbraceDesktop.class, > org.slf4j.event.Level.ERROR, "Parent was " + > parent.getClass().getSimpleName()); > } > return error; > } > } > return false; > } > > I think we should we should put the index check potential fix in and > log when it happens. As far as we can tell, if this issue gets hit, it's > catastrophic 100% of the time. The fix might resolve the issue. It can't > really make it any worse. Another thing we should do is add a check for > recursive entry to that method and log when that occurs. That's (I think) > the real issue, and without a stack trace of that, it's hard to find the > root cause. > > I don't know if anyone else has experienced this issue and has > insights/workarounds? > > Dean > > > On Mon, Mar 24, 2025 at 5:22 PM Christopher Schnick <crschn...@xpipe.io> > wrote: > >> Hello, >> >> We encountered an issue after updating our application implementation to >> frequently change the visibility of nodes. We are essentially now running >> an implementation that very frequently changes the visibility of various >> children nodes based on when they are needed and shown. When the user >> performs a lot of actions, the visibility of many nodes will be changed >> rapidly. >> >> For that, there are many listeners in place that listen for bounds >> changes of nodes to recheck whether they need to be made visible or not. >> All the visibility changes are queued up, so they are not immediately done >> in the listener after any bounds changes of parents. They are all properly >> done on the platform thread with runLater. When this implementation is >> running on many client systems, we sometimes receive an error report with >> an exception that looks something like this: >> >> java.lang.IndexOutOfBoundsException: Index -1 out of bounds for length 2 >> at >> java.base/jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:100) >> at >> java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:106) >> at >> java.base/jdk.internal.util.Preconditions.checkIndex(Preconditions.java:302) >> at java.base/java.util.Objects.checkIndex(Objects.java:365) >> at java.base/java.util.ArrayList.get(ArrayList.java:428) >> at >> javafx.base@25-ea/com.sun.javafx.collections.ObservableListWrapper.get >> (ObservableListWrapper.java:88) >> at >> javafx.base@25-ea/com.sun.javafx.collections.VetoableListDecorator.get >> (VetoableListDecorator.java:326) >> at javafx.graphics@25-ea/javafx.scene.Parent.updateCachedBounds >> (Parent.java:1769) >> at javafx.graphics@25-ea/javafx.scene.Parent.recomputeBounds >> (Parent.java:1713) >> at javafx.graphics@25-ea/javafx.scene.Parent.doComputeGeomBounds >> (Parent.java:1566) >> at javafx.graphics@25-ea/javafx.scene.Parent$1.doComputeGeomBounds >> (Parent.java:116) >> at >> javafx.graphics@25-ea/com.sun.javafx.scene.ParentHelper.computeGeomBoundsImpl >> (ParentHelper.java:84) >> at >> javafx.graphics@25-ea/com.sun.javafx.scene.layout.RegionHelper.superComputeGeomBoundsImpl >> (RegionHelper.java:78) >> at >> javafx.graphics@25-ea/com.sun.javafx.scene.layout.RegionHelper.superComputeGeomBounds >> (RegionHelper.java:62) >> at >> javafx.graphics@25-ea/javafx.scene.layout.Region.doComputeGeomBounds >> (Region.java:3301) >> at >> javafx.graphics@25-ea/javafx.scene.layout.Region$1.doComputeGeomBounds >> (Region.java:166) >> at >> javafx.graphics@25-ea/com.sun.javafx.scene.layout.RegionHelper.computeGeomBoundsImpl >> (RegionHelper.java:89) >> at >> javafx.graphics@25-ea/com.sun.javafx.scene.NodeHelper.computeGeomBounds >> (NodeHelper.java:101) >> at javafx.graphics@25-ea/javafx.scene.Node.updateGeomBounds >> (Node.java:3908) >> at javafx.graphics@25-ea/javafx.scene.Node.getGeomBounds >> (Node.java:3870) >> at javafx.graphics@25-ea/javafx.scene.Node.getLocalBounds >> (Node.java:3818) >> at javafx.graphics@25-ea/javafx.scene.Node.updateTxBounds >> (Node.java:3972) >> at javafx.graphics@25-ea/javafx.scene.Node.getTransformedBounds >> (Node.java:3764) >> at javafx.graphics@25-ea/javafx.scene.Node.updateBounds >> (Node.java:828) >> at javafx.graphics@25-ea/javafx.scene.Parent.updateBounds >> (Parent.java:1900) >> at javafx.graphics@25-ea/javafx.scene.Scene$ScenePulseListener.pulse >> (Scene.java:2670) >> at javafx.graphics@25-ea/com.sun.javafx.tk.Toolkit.runPulse >> (Toolkit.java:380) >> at javafx.graphics@25-ea/com.sun.javafx.tk.Toolkit.firePulse >> (Toolkit.java:401) >> at >> javafx.graphics@25-ea/com.sun.javafx.tk.quantum.QuantumToolkit.pulse >> (QuantumToolkit.java:592) >> at >> javafx.graphics@25-ea/com.sun.javafx.tk.quantum.QuantumToolkit.pulse >> (QuantumToolkit.java:572) >> at >> javafx.graphics@25-ea/com.sun.javafx.tk.quantum.QuantumToolkit.pulseFromQueue >> (QuantumToolkit.java:565) >> at >> javafx.graphics@25-ea/com.sun.javafx.tk.quantum.QuantumToolkit.lambda$runToolkit$6 >> (QuantumToolkit.java:346) >> at >> javafx.graphics@25-ea/com.sun.glass.ui.InvokeLaterDispatcher$Future.run$$$capture >> (InvokeLaterDispatcher.java:95) >> at >> javafx.graphics@25-ea/com.sun.glass.ui.InvokeLaterDispatcher$Future.run >> (InvokeLaterDispatcher.java) >> >> The index out of bounds is not always the same, there are various >> variations of this. It happens on all operating systems. It seems like >> there is a very specific scenario where an index can be out of bounds. This >> happens very rarely, like only a few times out of some hundred application >> runs, so I tried my best at forcing it to reproduce. >> >> The following reproducer works most of the time, but it might have to be >> run multiple times. I am aware that it eventually results in a >> StackOverflow, but that was the best way to force it reliably, by just >> continuously spamming visibility changes to eventually encounter this rare >> issue. But I want to emphasize that the same error also occurs naturally >> when not being forced like this, but it is just a lot more rare. So the >> StackOverflow in the reproducer has nothing to do with this issue, it also >> happens later on. >> >> import javafx.application.Application;import javafx.scene.Scene;import >> javafx.scene.control.Button;import javafx.scene.layout.Region;import >> javafx.scene.layout.StackPane;import javafx.scene.layout.VBox;import >> javafx.stage.Stage; >> import java.io.IOException; >> public class ParentBoundsBug extends Application { >> >> @Override public void start(Stage stage) throws IOException { >> Scene scene = new Scene(createContent(), 640, 480); >> stage.setScene(scene); >> stage.show(); >> stage.centerOnScreen(); >> } >> >> private Region createContent() { >> var b1 = new Button("Click me!"); >> var b2 = new Button("Click me!"); >> var vbox = new VBox(b1, b2); >> b1.boundsInParentProperty().addListener((observable, oldValue, >> newValue) -> { >> vbox.setVisible(!vbox.isVisible()); >> }); >> b2.boundsInParentProperty().addListener((observable, oldValue, >> newValue) -> { >> vbox.setVisible(!vbox.isVisible()); >> }); >> vbox.boundsInParentProperty().addListener((observable, oldValue, >> newValue) -> { >> vbox.setVisible(!vbox.isVisible()); >> }); >> >> var stack = new StackPane(vbox, new StackPane()); >> stack.boundsInParentProperty().addListener((observable, oldValue, >> newValue) -> { >> vbox.setVisible(!vbox.isVisible()); >> }); >> return stack; >> } >> >> public static void main(String[] args) { >> launch(); >> } >> } >> >> >> It doesn't necessarily have something to do with running the visibility >> change directly in the listener, our application does a runLater to change >> the visibility state, still with the same results. To properly debug this, >> you will have to launch the reproducer with a bigger stack size like -Xss8m >> to increase the chance that it occurs. Then, you can just set a breakpoint >> at jdk.internal.util.Preconditions:302, and wait for it to trigger the OOB >> eventually. >> >> This problem is currently the biggest JavaFX issue for us as it breaks >> the layout and usually requires a restart to fix. >> >> Looking at the bounds calculation code, the list index bounds check is >> very optimistic in that it doesn't check any indices and relies on multiple >> assumtions to hold. So if it is very difficult to find the cause, a simple >> index bounds check for the list access would also work fine. >> >> Best >> Christopher Schnick >> >