Hi Karl, thanks for the submission. I'm sending this to awt-dev and bcc'ing jdk8u-dev as thats a more appropriate venue for this discussion.
-Rob On 11/01/17 08:44, Karl von Randow wrote: > I have encountered a deadlock in Java 1.8.0_112 when changing between > discrete and integrated GPU on a retina MacBook Pro. The deadlock is between: > > CGLGraphicsConfig.getCGLConfigInfo, running on AWT.EventQueue, trying to call > [GraphicsConfigUtil _getCGLConfigInfo:] on > the main thread (AppKit thread) while it holds the AWT lock and is > synchronized on CGraphicsEnvironment. > > and > > A) the AppKit main thread trying to call > CGraphicsEnvironment._displayReconfiguration (via displaycb_handle in > CGraphicsEnv.m) > and synchronizing on CGraphicsEnvironment—deadlock. > > or > > B) the AppKit main thread trying to render, and trying to acquire the > OGLRenderQueue lock (which is the the AWT lock) > > > SUPPORTING STACK DUMPS > > - SCENARIO A > > CGraphicsEnvironment._displayReconfiguration is called on the main thread > since > 8041900: [macosx] Java forces the use of discrete GPU > (https://bugs.openjdk.java.net/browse/JDK-8041900 > <https://bugs.openjdk.java.net/browse/JDK-8041900>) which appears as > changeset 11227. > In the native thread dump below you can see the frame for displaycb_handle > which is the block dispatched to the main thread to call > CGraphicsEnvironment._displayReconfiguration. > > Java stacks > > "AWT-EventQueue-0" #16 prio=6 os_prio=31 tid=0x00007fbc72a0a800 nid=0x1251f > runnable [0x000070000e443000] > java.lang.Thread.State: RUNNABLE > at sun.java2d.opengl.CGLGraphicsConfig.getCGLConfigInfo(Native Method) > at > sun.java2d.opengl.CGLGraphicsConfig.getConfig(CGLGraphicsConfig.java:147) > at sun.awt.CGraphicsDevice.<init>(CGraphicsDevice.java:64) > at > sun.awt.CGraphicsEnvironment.initDevices(CGraphicsEnvironment.java:163) > - locked <0x00000006c0721bb0> (a sun.awt.CGraphicsEnvironment) > at > sun.awt.CGraphicsEnvironment.getDefaultScreenDevice(CGraphicsEnvironment.java:181) > - locked <0x00000006c0721bb0> (a sun.awt.CGraphicsEnvironment) > at sun.lwawt.macosx.LWCToolkit.getScreenResolution(LWCToolkit.java:415) > at > net.miginfocom.swing.SwingComponentWrapper.getHorizontalScreenDPI(SwingComponentWrapper.java:260) > at > net.miginfocom.swing.SwingComponentWrapper.getPixelUnitFactor(SwingComponentWrapper.java:119) > [SNIP] > at java.awt.Container.layout(Container.java:1510) > at java.awt.Container.doLayout(Container.java:1499) > at java.awt.Container.validateTree(Container.java:1695) > [SNIP] > at > javax.swing.RepaintManager$ProcessingRunnable.run(RepaintManager.java:1750) > [SNIP] > at > java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:201) > at > java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:116) > at > java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:105) > at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101) > at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:93) > at java.awt.EventDispatchThread.run(EventDispatchThread.java:82) > > "AppKit Thread" #11 daemon prio=5 os_prio=31 tid=0x00007fbc75046800 nid=0x307 > waiting for monitor entry [0x00007fff5b579000] > java.lang.Thread.State: BLOCKED (on object monitor) > at > sun.awt.CGraphicsEnvironment._displayReconfiguration(CGraphicsEnvironment.java:129) > - waiting to lock <0x00000006c0721bb0> (a sun.awt.CGraphicsEnvironment) > > Native stacks > > Thread 0x6828c3 DispatchQueue 1 Thread name "AppKit > Thread" 1000 samples (1-1000) priority 46 (base > 46) cpu time <0.001 > 1000 start + 52 (Charles + 5156) [0x104682424] > 1000 main + 153 (Charles + 5321) [0x1046824c9] > 1000 launch + 10872 (Charles + 16520) [0x104685088] > 1000 JLI_Launch + 1952 (libjli.dylib + 5668) [0x104702624] > 1000 CreateExecutionEnvironment + 871 (libjli.dylib + 22781) > [0x1047068fd] > 1000 CFRunLoopRunSpecific + 420 (CoreFoundation + 555380) > [0x7fff97665974] > 1000 __CFRunLoopRun + 934 (CoreFoundation + 556918) > [0x7fff97665f76] > 1000 __CFRunLoopDoSources0 + 557 (CoreFoundation + 559741) > [0x7fff97666a7d] > 1000 > __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__ + 17 > (CoreFoundation + 686465) [0x7fff97685981] > 1000 __NSThreadPerformPerform + 326 (Foundation + > 465034) [0x7fff990c988a] > 1000 -[AWTStarter starter:] + 905 (libawt_lwawt.dylib > + 286207) [0x113081dff] > 1000 +[NSApplicationAWT runAWTLoopWithApp:] + 156 > (libosxapp.dylib + 8525) [0x1130fa14d] > [SNIP] > 1000 +[JNFRunLoop > _performCopiedBlock:] + 17 (JavaNativeFoundation + 28474) [0x112d0df3a] > 1000 > __displaycb_handle_block_invoke_1 + 172 (libawt_lwawt.dylib + 119659) > [0x11305936b] > 1000 > JNFPerformEnvBlock + 87 (JavaNativeFoundation + 27229) [0x112d0da5d] > 1000 > __displaycb_handle_block_invoke_2 + 80 (libawt_lwawt.dylib + 119988) > [0x1130594b4] > 1000 > JNFCallVoidMethod + 187 (JavaNativeFoundation + 13743) [0x112d0a5af] > 1000 > jni_CallVoidMethodV + 248 (libjvm.dylib + 3069241) [0x106301539] > 1000 > jni_invoke_nonstatic(JNIEnv_*, JavaValue*, _jobject*, JNICallType, > _jmethodID*, JNI_ArgumentPusher*, Thread*) + 748 (libjvm.dylib + 3124227) > [0x10630ec03] > 1000 > JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, > Thread*) + 1710 (libjvm.dylib + 3015574) [0x1062f4396] > 1000 ??? > [0x113db94e7] > 1000 ??? > [0x113dde021] > 1000 > InterpreterRuntime::monitorenter(JavaThread*, BasicObjectLock*) + 165 > (libjvm.dylib + 2995347) [0x1062ef493] > 1000 > ObjectMonitor::enter(Thread*) + 472 (libjvm.dylib + 4524724) [0x106464ab4] > > 1000 ObjectMonitor::EnterI(Thread*) + 532 (libjvm.dylib + 4521584) > [0x106463e70] > > 1000 os::PlatformEvent::park(long) + 404 (libjvm.dylib + 4561328) > [0x10646d9b0] > > 1000 __psynch_cvwait + 10 (libsystem_kernel.dylib + 105606) [0x7fffaccecc86] > > *1000 psynch_cvcontinue + 0 (pthread + 39138) [0xffffff7f80f978e2] > > Thread 0x68292c Thread name "Java: AWT-EventQueue-0" > 1000 samples (1-1000) priority 31 (base 31) > 1000 thread_start + 13 (libsystem_pthread.dylib + 12797) [0x7fffacdd51fd] > 1000 _pthread_start + 286 (libsystem_pthread.dylib + 14839) > [0x7fffacdd59f7] > 1000 _pthread_body + 180 (libsystem_pthread.dylib + 15019) > [0x7fffacdd5aab] > 1000 java_start(Thread*) + 246 (libjvm.dylib + 4574506) [0x106470d2a] > 1000 JavaThread::run() + 448 (libjvm.dylib + 5486408) [0x10654f748] > 1000 JavaThread::thread_main_inner() + 155 (libjvm.dylib + > 5480593) [0x10654e091] > 1000 thread_entry(JavaThread*, Thread*) + 124 (libjvm.dylib + > 3270354) [0x1063326d2] > 1000 JavaCalls::call_virtual(JavaValue*, Handle, > KlassHandle, Symbol*, Symbol*, Thread*) + 74 (libjvm.dylib + 3017936) > [0x1062f4cd0] > 1000 JavaCalls::call_virtual(JavaValue*, KlassHandle, > Symbol*, Symbol*, JavaCallArguments*, Thread*) + 356 (libjvm.dylib + 3017508) > [0x1062f4b24] > 1000 JavaCalls::call_helper(JavaValue*, methodHandle*, > JavaCallArguments*, Thread*) + 1710 (libjvm.dylib + 3015574) [0x1062f4396] > [SNIP] > > 1000 Java_sun_java2d_opengl_CGLGraphicsConfig_getCGLConfigInfo > + 279 (libawt_lwawt.dylib + 107562) [0x11305642a] > > 1000 -[NSObject(NSThreadPerformAdditions) > performSelectorOnMainThread:withObject:waitUntilDone:] + 131 (Foundation + > 203394) [0x7fff99089a82] > > 1000 -[NSObject(NSThreadPerformAdditions) > performSelector:onThread:withObject:waitUntilDone:modes:] + 904 (Foundation + > 204424) [0x7fff99089e88] > > 1000 -[NSCondition wait] + 240 (Foundation + 208331) > [0x7fff9908adcb] > > 1000 __psynch_cvwait + 10 (libsystem_kernel.dylib + > 105606) [0x7fffaccecc86] > > *1000 psynch_cvcontinue + 0 (pthread + 39138) > [0xffffff7f80f978e2] > > > > > > > - Scenario B > > Java stacks > > "AWT-EventQueue-0" #15 prio=6 os_prio=31 tid=0x00007fba611d2000 nid=0x1260f > runnable [0x0000700005365000] > java.lang.Thread.State: RUNNABLE > at sun.java2d.opengl.CGLGraphicsConfig.getCGLConfigInfo(Native Method) > at > sun.java2d.opengl.CGLGraphicsConfig.getConfig(CGLGraphicsConfig.java:147) > at sun.awt.CGraphicsDevice.<init>(CGraphicsDevice.java:64) > at > sun.awt.CGraphicsEnvironment.initDevices(CGraphicsEnvironment.java:163) > - locked <0x00000006c0df8c18> (a sun.awt.CGraphicsEnvironment) > at > sun.awt.CGraphicsEnvironment.getDefaultScreenDevice(CGraphicsEnvironment.java:181) > - locked <0x00000006c0df8c18> (a sun.awt.CGraphicsEnvironment) > at sun.lwawt.macosx.LWCToolkit.getScreenResolution(LWCToolkit.java:415) > at > net.miginfocom.swing.SwingComponentWrapper.getHorizontalScreenDPI(SwingComponentWrapper.java:260) > at > net.miginfocom.swing.SwingComponentWrapper.getPixelUnitFactor(SwingComponentWrapper.java:119) > at net.miginfocom.layout.UnitValue.getPixelsExact(UnitValue.java:305) > at net.miginfocom.layout.UnitValue.getPixels(UnitValue.java:281) > [SNIP] > at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101) > at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:93) > at java.awt.EventDispatchThread.run(EventDispatchThread.java:82) > > "AppKit Thread" #11 daemon prio=5 os_prio=31 tid=0x00007fba59869800 nid=0x307 > waiting on condition [0x00007fff52ac2000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00000006c053b688> (a > java.util.concurrent.locks.ReentrantLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) > at > java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209) > at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) > at sun.awt.SunToolkit.awtLock(SunToolkit.java:253) > at sun.java2d.pipe.RenderQueue.lock(RenderQueue.java:112) > at sun.java2d.opengl.CGLLayer.drawInCGLContext(CGLLayer.java:139) > > Native stacks > > Thread 0x6764ca DispatchQueue 1 Thread name "AppKit > Thread" 1000 samples (1-1000) priority 46 (base > 46) > 1000 start + 52 (Charles + 5156) [0x10d139424] > 1000 main + 153 (Charles + 5321) [0x10d1394c9] > 1000 launch + 10872 (Charles + 16520) [0x10d13c088] > 1000 JLI_Launch + 1952 (libjli.dylib + 5668) [0x10d1b9624] > 1000 CreateExecutionEnvironment + 871 (libjli.dylib + 22781) > [0x10d1bd8fd] > 1000 CFRunLoopRunSpecific + 420 (CoreFoundation + 555380) > [0x7fff97665974] > 1000 __CFRunLoopRun + 934 (CoreFoundation + 556918) > [0x7fff97665f76] > 1000 __CFRunLoopDoSources0 + 557 (CoreFoundation + 559741) > [0x7fff97666a7d] > 1000 > __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__ + 17 > (CoreFoundation + 686465) [0x7fff97685981] > 1000 __NSThreadPerformPerform + 326 (Foundation + > 465034) [0x7fff990c988a] > 1000 -[AWTStarter starter:] + 905 (libawt_lwawt.dylib > + 286207) [0x12abc3dff] > 1000 +[NSApplicationAWT runAWTLoopWithApp:] + 156 > (libosxapp.dylib + 8525) [0x12ac3c14d] > [SNIP] > 1000 > CA::Transaction::observer_callback(__CFRunLoopObserver*, unsigned long, > void*) + 108 (QuartzCore + 69522) [0x7fff9d393f92] > 1000 > CA::Transaction::commit() + 475 (QuartzCore + 67121) [0x7fff9d393631] > 1000 > CA::Context::commit_transaction(CA::Transaction*) + 280 (QuartzCore + > 1153144) [0x7fff9d49c878] > 1000 > CA::Layer::layout_and_display_if_needed(CA::Transaction*) + 35 (QuartzCore + > 1196185) [0x7fff9d4a7099] > 1000 > CA::Layer::display_if_needed(CA::Transaction*) + 572 (QuartzCore + 1195886) > [0x7fff9d4a6f6e] > 1000 -[CAOpenGLLayer > _display] + 351 (QuartzCore + 1117583) [0x7fff9d493d8f] > 1000 > CAOpenGLLayerDraw(CAOpenGLLayer*, double, CVTimeStamp const*, unsigned int) + > 873 (QuartzCore + 1118737) [0x7fff9d494211] > 1000 -[CGLLayer > drawInCGLContext:pixelFormat:forLayerTime:displayTime:] + 287 > (libawt_lwawt.dylib + 109022) [0x12ab989de] > 1000 > JNFCallVoidMethod + 187 (JavaNativeFoundation + 13743) [0x12a84f5af] > 1000 > jni_CallVoidMethodV + 248 (libjvm.dylib + 3069241) [0x10edb8539] > 1000 > jni_invoke_nonstatic(JNIEnv_*, JavaValue*, _jobject*, JNICallType, > _jmethodID*, JNI_ArgumentPusher*, Thread*) + 748 (libjvm.dylib + 3124227) > [0x10edc5c03] > 1000 > JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, > Thread*) + 1710 (libjvm.dylib + 3015574) [0x10edab396] > 1000 > ??? [0x10ffa0854] > 1000 > ??? [0x11027642a] > > 1000 Unsafe_Park + 126 (libjvm.dylib + 5571927) [0x10f01b557] > > 1000 Parker::park(bool, long) + 495 (libjvm.dylib + 4560765) [0x10ef2477d] > > 1000 __psynch_cvwait + 10 (libsystem_kernel.dylib + 105606) [0x7fffaccecc86] > > *1000 psynch_cvcontinue + 0 (pthread + 39138) [0xffffff7f80f978e2] > > Thread 0x67652b Thread name "Java: AWT-EventQueue-0" > 1000 samples (1-1000) priority 31 (base 31) > 1000 thread_start + 13 (libsystem_pthread.dylib + 12797) [0x7fffacdd51fd] > 1000 _pthread_start + 286 (libsystem_pthread.dylib + 14839) > [0x7fffacdd59f7] > 1000 _pthread_body + 180 (libsystem_pthread.dylib + 15019) > [0x7fffacdd5aab] > 1000 java_start(Thread*) + 246 (libjvm.dylib + 4574506) [0x10ef27d2a] > 1000 JavaThread::run() + 448 (libjvm.dylib + 5486408) [0x10f006748] > 1000 JavaThread::thread_main_inner() + 155 (libjvm.dylib + > 5480593) [0x10f005091] > 1000 thread_entry(JavaThread*, Thread*) + 124 (libjvm.dylib + > 3270354) [0x10ede96d2] > [SNIP] > > 1000 ??? [0x10f7a0734] > > 1000 > Java_sun_java2d_opengl_CGLGraphicsConfig_getCGLConfigInfo + 279 > (libawt_lwawt.dylib + 107562) [0x12ab9842a] > > 1000 -[NSObject(NSThreadPerformAdditions) > performSelectorOnMainThread:withObject:waitUntilDone:] + 131 (Foundation + > 203394) [0x7fff99089a82] > > 1000 -[NSObject(NSThreadPerformAdditions) > performSelector:onThread:withObject:waitUntilDone:modes:] + 904 (Foundation + > 204424) [0x7fff99089e88] > > 1000 -[NSCondition wait] + 240 (Foundation + 208331) > [0x7fff9908adcb] > > 1000 __psynch_cvwait + 10 (libsystem_kernel.dylib + > 105606) [0x7fffaccecc86] > > *1000 psynch_cvcontinue + 0 (pthread + 39138) > [0xffffff7f80f978e2] > > > INTERPRETATION > > The deadlock is a race condition when macOS changes between the discrete and > integrated GPU. > > When the GPU changes, the result of CGraphicsEnvironment.getMainDisplayID() > changes immediately (There is a comment in CGraphicsEnvironment.m > that notes that the display ID changes in this case, and I have verified > this) to return the new displayID, while the devices map is only built once > initDevices() is called. > > CGLGraphicsConfig.getCGLConfigInfo (which is called as a consequence of > initDevices, as per stack traces) calls out and waits on the AppKit main > thread. I think this is > always dangerous due to the locks that the code calling it holds. I think we > should avoid getCGLConfigInfo being called on anything but the AppKit main > thread. I believe > this was the intention of 8041900: [macosx] Java forces the use of discrete > GPU (https://bugs.openjdk.java.net/browse/JDK-8041900 > <https://bugs.openjdk.java.net/browse/JDK-8041900>). > > CGraphicsEnvironment.getDefaultScreenDevice() is called from AWT layout code > (as per the stacks) and it calls CGraphicsEnvironment.getMainDisplayID() each > time. > If CGraphicsEnvironment.getDefaultScreenDevice() is called _after_ the GPU > change, but _before_ CGraphicsEnvironment._displayReconfiguration() has been > called, > the CGraphicsDevice for the new display ID cannot be found in the devices > Map, so initDevices() is called from > CGraphicsEnvironment.getDefaultScreenDevice() > on the AWT-EventQueue thread. > > There is a note in getDefaultScreenDevice() for this case: > we do not expect that this may happen, the only response is to > re-initialize the list of devices > > Calling initDevices() here results in a call to > CGLGraphicsConfig.getCGLConfigInfo, which then calls > [GraphicsConfigUtil _getCGLConfigInfo:] on the AppKit main thread and waits > for the result. > > As the current thread (AWT Event queue) is holding the AWT lock, and is > synchronized on CGraphicsEnvironment, the two deadlock > conditions described above can occur. > > > REPRODUCABILITY > > This happens quite regularly on my machine, and for my users. To reproduce it > I have launched my app while the integrated GPU is active, then launched and > quit an app that requires the discrete GPU. One to five repetitions are > required to create the hanging condition. > > I believe the issue is triggered by my use of MigLayout, which results in the > call to CGraphicsEnvironment as per this excerpt from the stack traces above: > > at > sun.awt.CGraphicsEnvironment.getDefaultScreenDevice(CGraphicsEnvironment.java:181) > - locked <0x00000006c0721bb0> (a sun.awt.CGraphicsEnvironment) > at sun.lwawt.macosx.LWCToolkit.getScreenResolution(LWCToolkit.java:415) > at > net.miginfocom.swing.SwingComponentWrapper.getHorizontalScreenDPI(SwingComponentWrapper.java:260) > at > net.miginfocom.swing.SwingComponentWrapper.getPixelUnitFactor(SwingComponentWrapper.java:119) > > > PATCH > > I believe the solution is to remember the main display ID along with the > devices Map, and to change the main display ID when initDevices is called. > This appears to work in my setup. There is however _sometimes_ a flash of > half-size rendering, presumably while the rendering is working on the old > device > before the reconfiguration / initDevices occurs. > > Below is a simple patch to demonstrate that approach. Generally I don’t think > initDevices() should ever be called on the AWT-EventQueue, but in my tests > (as per the comment) > that no longer happens with this patch. > > diff -r 5dd7e4bae5c2 src/macosx/classes/sun/awt/CGraphicsEnvironment.java > --- a/src/macosx/classes/sun/awt/CGraphicsEnvironment.java Thu Sep 22 > 13:17:42 2016 -0700 > +++ b/src/macosx/classes/sun/awt/CGraphicsEnvironment.java Sat Jan 07 > 20:49:39 2017 +1300 > @@ -95,6 +95,7 @@ > > /** Available CoreGraphics displays. */ > private final Map<Integer, CGraphicsDevice> devices = new HashMap<>(5); > + private int inittedMainDisplayID; > > /** Reference to the display reconfiguration callback context. */ > private final long displayReconfigContext; > @@ -153,6 +154,7 @@ > devices.clear(); > > int mainID = getMainDisplayID(); > + inittedMainDisplayID = mainID; > > // initialization of the graphics device may change > // list of displays on hybrid systems via an activation > @@ -173,14 +175,13 @@ > > @Override > public synchronized GraphicsDevice getDefaultScreenDevice() throws > HeadlessException { > - final int mainDisplayID = getMainDisplayID(); > - CGraphicsDevice d = devices.get(mainDisplayID); > + CGraphicsDevice d = devices.get(inittedMainDisplayID); > if (d == null) { > // we do not expect that this may happen, the only response > // is to re-initialize the list of devices > initDevices(); > > - d = devices.get(mainDisplayID); > + d = devices.get(inittedMainDisplayID); > if (d == null) { > throw new AWTError("no screen devices"); > } > > > >