Hi Karl, thanks for the submission. I'm sending this to awt-dev and
bcc'ing jdk8u-dev as thats a more appropriate venue for this discussion.
-Rob
On 11/01/17 08:44, Karl von Randow wrote:
> I have encountered a deadlock in Java 1.8.0_112 when changing between
> discrete and integrated GPU on a retina MacBook Pro. The deadlock is between:
>
> CGLGraphicsConfig.getCGLConfigInfo, running on AWT.EventQueue, trying to call
> [GraphicsConfigUtil _getCGLConfigInfo:] on
> the main thread (AppKit thread) while it holds the AWT lock and is
> synchronized on CGraphicsEnvironment.
>
> and
>
> A) the AppKit main thread trying to call
> CGraphicsEnvironment._displayReconfiguration (via displaycb_handle in
> CGraphicsEnv.m)
> and synchronizing on CGraphicsEnvironment—deadlock.
>
> or
>
> B) the AppKit main thread trying to render, and trying to acquire the
> OGLRenderQueue lock (which is the the AWT lock)
>
>
> SUPPORTING STACK DUMPS
>
> - SCENARIO A
>
> CGraphicsEnvironment._displayReconfiguration is called on the main thread
> since
> 8041900: [macosx] Java forces the use of discrete GPU
> (https://bugs.openjdk.java.net/browse/JDK-8041900
> <https://bugs.openjdk.java.net/browse/JDK-8041900>) which appears as
> changeset 11227.
> In the native thread dump below you can see the frame for displaycb_handle
> which is the block dispatched to the main thread to call
> CGraphicsEnvironment._displayReconfiguration.
>
> Java stacks
>
> "AWT-EventQueue-0" #16 prio=6 os_prio=31 tid=0x00007fbc72a0a800 nid=0x1251f
> runnable [0x000070000e443000]
> java.lang.Thread.State: RUNNABLE
> at sun.java2d.opengl.CGLGraphicsConfig.getCGLConfigInfo(Native Method)
> at
> sun.java2d.opengl.CGLGraphicsConfig.getConfig(CGLGraphicsConfig.java:147)
> at sun.awt.CGraphicsDevice.<init>(CGraphicsDevice.java:64)
> at
> sun.awt.CGraphicsEnvironment.initDevices(CGraphicsEnvironment.java:163)
> - locked <0x00000006c0721bb0> (a sun.awt.CGraphicsEnvironment)
> at
> sun.awt.CGraphicsEnvironment.getDefaultScreenDevice(CGraphicsEnvironment.java:181)
> - locked <0x00000006c0721bb0> (a sun.awt.CGraphicsEnvironment)
> at sun.lwawt.macosx.LWCToolkit.getScreenResolution(LWCToolkit.java:415)
> at
> net.miginfocom.swing.SwingComponentWrapper.getHorizontalScreenDPI(SwingComponentWrapper.java:260)
> at
> net.miginfocom.swing.SwingComponentWrapper.getPixelUnitFactor(SwingComponentWrapper.java:119)
> [SNIP]
> at java.awt.Container.layout(Container.java:1510)
> at java.awt.Container.doLayout(Container.java:1499)
> at java.awt.Container.validateTree(Container.java:1695)
> [SNIP]
> at
> javax.swing.RepaintManager$ProcessingRunnable.run(RepaintManager.java:1750)
> [SNIP]
> at
> java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:201)
> at
> java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:116)
> at
> java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:105)
> at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101)
> at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:93)
> at java.awt.EventDispatchThread.run(EventDispatchThread.java:82)
>
> "AppKit Thread" #11 daemon prio=5 os_prio=31 tid=0x00007fbc75046800 nid=0x307
> waiting for monitor entry [0x00007fff5b579000]
> java.lang.Thread.State: BLOCKED (on object monitor)
> at
> sun.awt.CGraphicsEnvironment._displayReconfiguration(CGraphicsEnvironment.java:129)
> - waiting to lock <0x00000006c0721bb0> (a sun.awt.CGraphicsEnvironment)
>
> Native stacks
>
> Thread 0x6828c3 DispatchQueue 1 Thread name "AppKit
> Thread" 1000 samples (1-1000) priority 46 (base
> 46) cpu time <0.001
> 1000 start + 52 (Charles + 5156) [0x104682424]
> 1000 main + 153 (Charles + 5321) [0x1046824c9]
> 1000 launch + 10872 (Charles + 16520) [0x104685088]
> 1000 JLI_Launch + 1952 (libjli.dylib + 5668) [0x104702624]
> 1000 CreateExecutionEnvironment + 871 (libjli.dylib + 22781)
> [0x1047068fd]
> 1000 CFRunLoopRunSpecific + 420 (CoreFoundation + 555380)
> [0x7fff97665974]
> 1000 __CFRunLoopRun + 934 (CoreFoundation + 556918)
> [0x7fff97665f76]
> 1000 __CFRunLoopDoSources0 + 557 (CoreFoundation + 559741)
> [0x7fff97666a7d]
> 1000
> __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__ + 17
> (CoreFoundation + 686465) [0x7fff97685981]
> 1000 __NSThreadPerformPerform + 326 (Foundation +
> 465034) [0x7fff990c988a]
> 1000 -[AWTStarter starter:] + 905 (libawt_lwawt.dylib
> + 286207) [0x113081dff]
> 1000 +[NSApplicationAWT runAWTLoopWithApp:] + 156
> (libosxapp.dylib + 8525) [0x1130fa14d]
> [SNIP]
> 1000 +[JNFRunLoop
> _performCopiedBlock:] + 17 (JavaNativeFoundation + 28474) [0x112d0df3a]
> 1000
> __displaycb_handle_block_invoke_1 + 172 (libawt_lwawt.dylib + 119659)
> [0x11305936b]
> 1000
> JNFPerformEnvBlock + 87 (JavaNativeFoundation + 27229) [0x112d0da5d]
> 1000
> __displaycb_handle_block_invoke_2 + 80 (libawt_lwawt.dylib + 119988)
> [0x1130594b4]
> 1000
> JNFCallVoidMethod + 187 (JavaNativeFoundation + 13743) [0x112d0a5af]
> 1000
> jni_CallVoidMethodV + 248 (libjvm.dylib + 3069241) [0x106301539]
> 1000
> jni_invoke_nonstatic(JNIEnv_*, JavaValue*, _jobject*, JNICallType,
> _jmethodID*, JNI_ArgumentPusher*, Thread*) + 748 (libjvm.dylib + 3124227)
> [0x10630ec03]
> 1000
> JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*,
> Thread*) + 1710 (libjvm.dylib + 3015574) [0x1062f4396]
> 1000 ???
> [0x113db94e7]
> 1000 ???
> [0x113dde021]
> 1000
> InterpreterRuntime::monitorenter(JavaThread*, BasicObjectLock*) + 165
> (libjvm.dylib + 2995347) [0x1062ef493]
> 1000
> ObjectMonitor::enter(Thread*) + 472 (libjvm.dylib + 4524724) [0x106464ab4]
>
> 1000 ObjectMonitor::EnterI(Thread*) + 532 (libjvm.dylib + 4521584)
> [0x106463e70]
>
> 1000 os::PlatformEvent::park(long) + 404 (libjvm.dylib + 4561328)
> [0x10646d9b0]
>
> 1000 __psynch_cvwait + 10 (libsystem_kernel.dylib + 105606) [0x7fffaccecc86]
>
> *1000 psynch_cvcontinue + 0 (pthread + 39138) [0xffffff7f80f978e2]
>
> Thread 0x68292c Thread name "Java: AWT-EventQueue-0"
> 1000 samples (1-1000) priority 31 (base 31)
> 1000 thread_start + 13 (libsystem_pthread.dylib + 12797) [0x7fffacdd51fd]
> 1000 _pthread_start + 286 (libsystem_pthread.dylib + 14839)
> [0x7fffacdd59f7]
> 1000 _pthread_body + 180 (libsystem_pthread.dylib + 15019)
> [0x7fffacdd5aab]
> 1000 java_start(Thread*) + 246 (libjvm.dylib + 4574506) [0x106470d2a]
> 1000 JavaThread::run() + 448 (libjvm.dylib + 5486408) [0x10654f748]
> 1000 JavaThread::thread_main_inner() + 155 (libjvm.dylib +
> 5480593) [0x10654e091]
> 1000 thread_entry(JavaThread*, Thread*) + 124 (libjvm.dylib +
> 3270354) [0x1063326d2]
> 1000 JavaCalls::call_virtual(JavaValue*, Handle,
> KlassHandle, Symbol*, Symbol*, Thread*) + 74 (libjvm.dylib + 3017936)
> [0x1062f4cd0]
> 1000 JavaCalls::call_virtual(JavaValue*, KlassHandle,
> Symbol*, Symbol*, JavaCallArguments*, Thread*) + 356 (libjvm.dylib + 3017508)
> [0x1062f4b24]
> 1000 JavaCalls::call_helper(JavaValue*, methodHandle*,
> JavaCallArguments*, Thread*) + 1710 (libjvm.dylib + 3015574) [0x1062f4396]
> [SNIP]
>
> 1000 Java_sun_java2d_opengl_CGLGraphicsConfig_getCGLConfigInfo
> + 279 (libawt_lwawt.dylib + 107562) [0x11305642a]
>
> 1000 -[NSObject(NSThreadPerformAdditions)
> performSelectorOnMainThread:withObject:waitUntilDone:] + 131 (Foundation +
> 203394) [0x7fff99089a82]
>
> 1000 -[NSObject(NSThreadPerformAdditions)
> performSelector:onThread:withObject:waitUntilDone:modes:] + 904 (Foundation +
> 204424) [0x7fff99089e88]
>
> 1000 -[NSCondition wait] + 240 (Foundation + 208331)
> [0x7fff9908adcb]
>
> 1000 __psynch_cvwait + 10 (libsystem_kernel.dylib +
> 105606) [0x7fffaccecc86]
>
> *1000 psynch_cvcontinue + 0 (pthread + 39138)
> [0xffffff7f80f978e2]
>
>
>
>
>
>
> - Scenario B
>
> Java stacks
>
> "AWT-EventQueue-0" #15 prio=6 os_prio=31 tid=0x00007fba611d2000 nid=0x1260f
> runnable [0x0000700005365000]
> java.lang.Thread.State: RUNNABLE
> at sun.java2d.opengl.CGLGraphicsConfig.getCGLConfigInfo(Native Method)
> at
> sun.java2d.opengl.CGLGraphicsConfig.getConfig(CGLGraphicsConfig.java:147)
> at sun.awt.CGraphicsDevice.<init>(CGraphicsDevice.java:64)
> at
> sun.awt.CGraphicsEnvironment.initDevices(CGraphicsEnvironment.java:163)
> - locked <0x00000006c0df8c18> (a sun.awt.CGraphicsEnvironment)
> at
> sun.awt.CGraphicsEnvironment.getDefaultScreenDevice(CGraphicsEnvironment.java:181)
> - locked <0x00000006c0df8c18> (a sun.awt.CGraphicsEnvironment)
> at sun.lwawt.macosx.LWCToolkit.getScreenResolution(LWCToolkit.java:415)
> at
> net.miginfocom.swing.SwingComponentWrapper.getHorizontalScreenDPI(SwingComponentWrapper.java:260)
> at
> net.miginfocom.swing.SwingComponentWrapper.getPixelUnitFactor(SwingComponentWrapper.java:119)
> at net.miginfocom.layout.UnitValue.getPixelsExact(UnitValue.java:305)
> at net.miginfocom.layout.UnitValue.getPixels(UnitValue.java:281)
> [SNIP]
> at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101)
> at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:93)
> at java.awt.EventDispatchThread.run(EventDispatchThread.java:82)
>
> "AppKit Thread" #11 daemon prio=5 os_prio=31 tid=0x00007fba59869800 nid=0x307
> waiting on condition [0x00007fff52ac2000]
> java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <0x00000006c053b688> (a
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
> at
> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
> at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
> at sun.awt.SunToolkit.awtLock(SunToolkit.java:253)
> at sun.java2d.pipe.RenderQueue.lock(RenderQueue.java:112)
> at sun.java2d.opengl.CGLLayer.drawInCGLContext(CGLLayer.java:139)
>
> Native stacks
>
> Thread 0x6764ca DispatchQueue 1 Thread name "AppKit
> Thread" 1000 samples (1-1000) priority 46 (base
> 46)
> 1000 start + 52 (Charles + 5156) [0x10d139424]
> 1000 main + 153 (Charles + 5321) [0x10d1394c9]
> 1000 launch + 10872 (Charles + 16520) [0x10d13c088]
> 1000 JLI_Launch + 1952 (libjli.dylib + 5668) [0x10d1b9624]
> 1000 CreateExecutionEnvironment + 871 (libjli.dylib + 22781)
> [0x10d1bd8fd]
> 1000 CFRunLoopRunSpecific + 420 (CoreFoundation + 555380)
> [0x7fff97665974]
> 1000 __CFRunLoopRun + 934 (CoreFoundation + 556918)
> [0x7fff97665f76]
> 1000 __CFRunLoopDoSources0 + 557 (CoreFoundation + 559741)
> [0x7fff97666a7d]
> 1000
> __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__ + 17
> (CoreFoundation + 686465) [0x7fff97685981]
> 1000 __NSThreadPerformPerform + 326 (Foundation +
> 465034) [0x7fff990c988a]
> 1000 -[AWTStarter starter:] + 905 (libawt_lwawt.dylib
> + 286207) [0x12abc3dff]
> 1000 +[NSApplicationAWT runAWTLoopWithApp:] + 156
> (libosxapp.dylib + 8525) [0x12ac3c14d]
> [SNIP]
> 1000
> CA::Transaction::observer_callback(__CFRunLoopObserver*, unsigned long,
> void*) + 108 (QuartzCore + 69522) [0x7fff9d393f92]
> 1000
> CA::Transaction::commit() + 475 (QuartzCore + 67121) [0x7fff9d393631]
> 1000
> CA::Context::commit_transaction(CA::Transaction*) + 280 (QuartzCore +
> 1153144) [0x7fff9d49c878]
> 1000
> CA::Layer::layout_and_display_if_needed(CA::Transaction*) + 35 (QuartzCore +
> 1196185) [0x7fff9d4a7099]
> 1000
> CA::Layer::display_if_needed(CA::Transaction*) + 572 (QuartzCore + 1195886)
> [0x7fff9d4a6f6e]
> 1000 -[CAOpenGLLayer
> _display] + 351 (QuartzCore + 1117583) [0x7fff9d493d8f]
> 1000
> CAOpenGLLayerDraw(CAOpenGLLayer*, double, CVTimeStamp const*, unsigned int) +
> 873 (QuartzCore + 1118737) [0x7fff9d494211]
> 1000 -[CGLLayer
> drawInCGLContext:pixelFormat:forLayerTime:displayTime:] + 287
> (libawt_lwawt.dylib + 109022) [0x12ab989de]
> 1000
> JNFCallVoidMethod + 187 (JavaNativeFoundation + 13743) [0x12a84f5af]
> 1000
> jni_CallVoidMethodV + 248 (libjvm.dylib + 3069241) [0x10edb8539]
> 1000
> jni_invoke_nonstatic(JNIEnv_*, JavaValue*, _jobject*, JNICallType,
> _jmethodID*, JNI_ArgumentPusher*, Thread*) + 748 (libjvm.dylib + 3124227)
> [0x10edc5c03]
> 1000
> JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*,
> Thread*) + 1710 (libjvm.dylib + 3015574) [0x10edab396]
> 1000
> ??? [0x10ffa0854]
> 1000
> ??? [0x11027642a]
>
> 1000 Unsafe_Park + 126 (libjvm.dylib + 5571927) [0x10f01b557]
>
> 1000 Parker::park(bool, long) + 495 (libjvm.dylib + 4560765) [0x10ef2477d]
>
> 1000 __psynch_cvwait + 10 (libsystem_kernel.dylib + 105606) [0x7fffaccecc86]
>
> *1000 psynch_cvcontinue + 0 (pthread + 39138) [0xffffff7f80f978e2]
>
> Thread 0x67652b Thread name "Java: AWT-EventQueue-0"
> 1000 samples (1-1000) priority 31 (base 31)
> 1000 thread_start + 13 (libsystem_pthread.dylib + 12797) [0x7fffacdd51fd]
> 1000 _pthread_start + 286 (libsystem_pthread.dylib + 14839)
> [0x7fffacdd59f7]
> 1000 _pthread_body + 180 (libsystem_pthread.dylib + 15019)
> [0x7fffacdd5aab]
> 1000 java_start(Thread*) + 246 (libjvm.dylib + 4574506) [0x10ef27d2a]
> 1000 JavaThread::run() + 448 (libjvm.dylib + 5486408) [0x10f006748]
> 1000 JavaThread::thread_main_inner() + 155 (libjvm.dylib +
> 5480593) [0x10f005091]
> 1000 thread_entry(JavaThread*, Thread*) + 124 (libjvm.dylib +
> 3270354) [0x10ede96d2]
> [SNIP]
>
> 1000 ??? [0x10f7a0734]
>
> 1000
> Java_sun_java2d_opengl_CGLGraphicsConfig_getCGLConfigInfo + 279
> (libawt_lwawt.dylib + 107562) [0x12ab9842a]
>
> 1000 -[NSObject(NSThreadPerformAdditions)
> performSelectorOnMainThread:withObject:waitUntilDone:] + 131 (Foundation +
> 203394) [0x7fff99089a82]
>
> 1000 -[NSObject(NSThreadPerformAdditions)
> performSelector:onThread:withObject:waitUntilDone:modes:] + 904 (Foundation +
> 204424) [0x7fff99089e88]
>
> 1000 -[NSCondition wait] + 240 (Foundation + 208331)
> [0x7fff9908adcb]
>
> 1000 __psynch_cvwait + 10 (libsystem_kernel.dylib +
> 105606) [0x7fffaccecc86]
>
> *1000 psynch_cvcontinue + 0 (pthread + 39138)
> [0xffffff7f80f978e2]
>
>
> INTERPRETATION
>
> The deadlock is a race condition when macOS changes between the discrete and
> integrated GPU.
>
> When the GPU changes, the result of CGraphicsEnvironment.getMainDisplayID()
> changes immediately (There is a comment in CGraphicsEnvironment.m
> that notes that the display ID changes in this case, and I have verified
> this) to return the new displayID, while the devices map is only built once
> initDevices() is called.
>
> CGLGraphicsConfig.getCGLConfigInfo (which is called as a consequence of
> initDevices, as per stack traces) calls out and waits on the AppKit main
> thread. I think this is
> always dangerous due to the locks that the code calling it holds. I think we
> should avoid getCGLConfigInfo being called on anything but the AppKit main
> thread. I believe
> this was the intention of 8041900: [macosx] Java forces the use of discrete
> GPU (https://bugs.openjdk.java.net/browse/JDK-8041900
> <https://bugs.openjdk.java.net/browse/JDK-8041900>).
>
> CGraphicsEnvironment.getDefaultScreenDevice() is called from AWT layout code
> (as per the stacks) and it calls CGraphicsEnvironment.getMainDisplayID() each
> time.
> If CGraphicsEnvironment.getDefaultScreenDevice() is called _after_ the GPU
> change, but _before_ CGraphicsEnvironment._displayReconfiguration() has been
> called,
> the CGraphicsDevice for the new display ID cannot be found in the devices
> Map, so initDevices() is called from
> CGraphicsEnvironment.getDefaultScreenDevice()
> on the AWT-EventQueue thread.
>
> There is a note in getDefaultScreenDevice() for this case:
> we do not expect that this may happen, the only response is to
> re-initialize the list of devices
>
> Calling initDevices() here results in a call to
> CGLGraphicsConfig.getCGLConfigInfo, which then calls
> [GraphicsConfigUtil _getCGLConfigInfo:] on the AppKit main thread and waits
> for the result.
>
> As the current thread (AWT Event queue) is holding the AWT lock, and is
> synchronized on CGraphicsEnvironment, the two deadlock
> conditions described above can occur.
>
>
> REPRODUCABILITY
>
> This happens quite regularly on my machine, and for my users. To reproduce it
> I have launched my app while the integrated GPU is active, then launched and
> quit an app that requires the discrete GPU. One to five repetitions are
> required to create the hanging condition.
>
> I believe the issue is triggered by my use of MigLayout, which results in the
> call to CGraphicsEnvironment as per this excerpt from the stack traces above:
>
> at
> sun.awt.CGraphicsEnvironment.getDefaultScreenDevice(CGraphicsEnvironment.java:181)
> - locked <0x00000006c0721bb0> (a sun.awt.CGraphicsEnvironment)
> at sun.lwawt.macosx.LWCToolkit.getScreenResolution(LWCToolkit.java:415)
> at
> net.miginfocom.swing.SwingComponentWrapper.getHorizontalScreenDPI(SwingComponentWrapper.java:260)
> at
> net.miginfocom.swing.SwingComponentWrapper.getPixelUnitFactor(SwingComponentWrapper.java:119)
>
>
> PATCH
>
> I believe the solution is to remember the main display ID along with the
> devices Map, and to change the main display ID when initDevices is called.
> This appears to work in my setup. There is however _sometimes_ a flash of
> half-size rendering, presumably while the rendering is working on the old
> device
> before the reconfiguration / initDevices occurs.
>
> Below is a simple patch to demonstrate that approach. Generally I don’t think
> initDevices() should ever be called on the AWT-EventQueue, but in my tests
> (as per the comment)
> that no longer happens with this patch.
>
> diff -r 5dd7e4bae5c2 src/macosx/classes/sun/awt/CGraphicsEnvironment.java
> --- a/src/macosx/classes/sun/awt/CGraphicsEnvironment.java Thu Sep 22
> 13:17:42 2016 -0700
> +++ b/src/macosx/classes/sun/awt/CGraphicsEnvironment.java Sat Jan 07
> 20:49:39 2017 +1300
> @@ -95,6 +95,7 @@
>
> /** Available CoreGraphics displays. */
> private final Map<Integer, CGraphicsDevice> devices = new HashMap<>(5);
> + private int inittedMainDisplayID;
>
> /** Reference to the display reconfiguration callback context. */
> private final long displayReconfigContext;
> @@ -153,6 +154,7 @@
> devices.clear();
>
> int mainID = getMainDisplayID();
> + inittedMainDisplayID = mainID;
>
> // initialization of the graphics device may change
> // list of displays on hybrid systems via an activation
> @@ -173,14 +175,13 @@
>
> @Override
> public synchronized GraphicsDevice getDefaultScreenDevice() throws
> HeadlessException {
> - final int mainDisplayID = getMainDisplayID();
> - CGraphicsDevice d = devices.get(mainDisplayID);
> + CGraphicsDevice d = devices.get(inittedMainDisplayID);
> if (d == null) {
> // we do not expect that this may happen, the only response
> // is to re-initialize the list of devices
> initDevices();
>
> - d = devices.get(mainDisplayID);
> + d = devices.get(inittedMainDisplayID);
> if (d == null) {
> throw new AWTError("no screen devices");
> }
>
>
>
>