Re: RFR: JDK-8203026: java.rmi.NoSuchObjectException: no such object in table
Here's an updated webrev that includes the change to bind to the stub. The jstatd tests continue to pass after this change. Webrev: http://cr.openjdk.java.net/~gadams/8203026/webrev.01/ It would be good to have a second reviewer from the serviceability team. Longer term it would be good to replace the rmi dependency, which does not appear to be central to the functionality provided here. On 3/22/19, 9:56 AM, Roger Riggs wrote: Hi Gary, Holding a static reference to the implementation solves the problem. But I noticed that the object that is bound in the registry is the RemoteHostImpl and it should be the RemoteHost stub. Line 145: should be: bind(name.toString(), stub); That is likely to solve the problem as well, because the distributed garbage collector will correctly cause a hard reference to be retained to the implementation. Roger On 03/22/2019 04:05 AM, gary.ad...@oracle.com wrote: Here's a proposed fix for the intermittent jstatd test failure. By moving the exported object from a local method variable to a static class variable a strong reference is held so the object will not be garbage collected prematurely. Webrev: http://cr.openjdk.java.net/~gadams/8203026/webrev.00/ Issue: https://bugs.openjdk.java.net/browse/JDK-8203026 The test is currently filed as a core-libs/java.rmi, but it probably should be core-svc/tools with label=jstatd. It is the callers responsibility to ensure the strong reference to prevent the premature garbage collection, so this is a test failure rather than a need to change the underlying rmi behavior.
Re: (trivial doc fix) RFR: 8220262: fix headings in java.logging
Doesn't the generated javadoc include an h2 for Class LogManager? That would make the h3 for LogManager Configuration the correct nesting level, even though it does not appear in the individual source file. On 3/12/19, 8:06 AM, Daniel Fuchs wrote: Hi, Please find below a trivial doc fix for: 8220262: fix headings in java.logging https://bugs.openjdk.java.net/browse/JDK-8220262 The change is just: - * LogManager Configuration + * LogManager Configuration full patch below. best regards, -- daniel = diff --git a/src/java.logging/share/classes/java/util/logging/LogManager.java b/src/java.logging/share/classes/java/util/logging/LogManager.java --- a/src/java.logging/share/classes/java/util/logging/LogManager.java +++ b/src/java.logging/share/classes/java/util/logging/LogManager.java @@ -1,5 +1,5 @@ /* - * Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved. + * Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved. * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. * * This code is free software; you can redistribute it and/or modify it @@ -64,7 +64,7 @@ * At startup the LogManager class is located using the * java.util.logging.manager system property. * - * LogManager Configuration + * LogManager Configuration * * A LogManager initializes the logging configuration via * the {@link #readConfiguration()} method during LogManager initialization. =
Re: [9] RFR: Static build of libzip is missing JNI_OnLoad_zip entry point
Never mind. You are right. The static case is fine with your fix. There won't be a JNI_OnLoad entry for the dynamic case, but that entry point is optional. On 05/25/16 12:44, Naoto Sato wrote: Attached is the output from "nm -g build/macosx_x64/jdk/lib/libzip.a" with the proposed fix applied. I see the symbol "_JNI_OnLoad_zip" in it. Am I missing something? Naoto On 5/25/16 9:01 AM, Gary Adams wrote: This fix will not work for the macosx static build. e.g. configure --enable-static-build=yes When linking static libraries the entry point for JNI_OnLoad_zip is needed to inform the symbol lookups to be performed on the current executable, rather than a dynamic library. On 05/24/16 19:47, Naoto Sato wrote: Hello, The previous attempt to fix this one failed in the installer build, as it does not recognize the added macro. I've updated the fix to add extra check for static build (yeah, this is redundant with the real jni_util.h, but effectively avoid the installer build failure). http://cr.openjdk.java.net/~naoto/8150179/webrev.01/ Please review. Naoto
Re: [9] RFR: Static build of libzip is missing JNI_OnLoad_zip entry point
This fix will not work for the macosx static build. e.g. configure --enable-static-build=yes When linking static libraries the entry point for JNI_OnLoad_zip is needed to inform the symbol lookups to be performed on the current executable, rather than a dynamic library. On 05/24/16 19:47, Naoto Sato wrote: Hello, The previous attempt to fix this one failed in the installer build, as it does not recognize the added macro. I've updated the fix to add extra check for static build (yeah, this is redundant with the real jni_util.h, but effectively avoid the installer build failure). http://cr.openjdk.java.net/~naoto/8150179/webrev.01/ Please review. Naoto
Re: Code Review - CR# 7030573 insufficient disk space for large file test
On 01/ 6/12 03:32 PM, Alan Bateman wrote: On 06/01/2012 19:47, Gary Adams wrote: I've updated the webrev with the temp file created in the current directory. I'm not sure what to do about the case if there is only a little space available only a small file will be used. Should the test fail and force the test operator to create a new test environment where 7G space is available? I lean toward allowing the test to pass using the space that is available. I looked briefly at the updated webrev and I see it checks the usable space on the file system/volume where the current directory is but creates the file in the temporary directory, easily done :-) I think you're right that the best we can do when there is insufficient space is just to run with the small file. It doesn't test the original issue but we don't want this test failing when there isn't sufficient space. Rather than creating largefile, deleting it, and then creating it again (twice) then how about: File file = File.createTempFile(largefile, null, new File(.)); long usableSpace = largefile.getUsableSpace(); if (usuableSpace == 0L) throw new RuntimeException(...); long size = Math.min(usableSpace, 7405576182L); recreateAsLargeFile(file, size); A fresh webrev is available : http://cr.openjdk.java.net/~gadams/7030573/ Includes the following changes - moved temp file create into main - use Math.min for selecting 7G or space available and for skip size - added runtime exception if no space available - reformatting to wrap 80 column lines
Re: Code Review - CR# 7030573 insufficient disk space for large file test
On 01/ 5/12 08:30 AM, Alan Bateman wrote: On 23/12/2011 16:19, Gary Adams wrote: The LargeFileAvailable regression test had intermittent failures when there was not sufficient space available to create a 7G temp file. This webrev presents a simple check to see if the available usable space is less than 7G and scales the test back accordingly. The original bug report suggests that the test be switched to use the current working directory rather than a temp file. I think that could be the wrong choice for an embedded system that might have the tests mounted from a remote file system. In that scenario, using the local temp file space provides a better solution for what this test is designed to check. http://cr.openjdk.java.net/~gadams/7030573/ The only thing is that when the test is scaled back too much then it no longer tests the original issue. This test will create a sparse file on file systems that support it and I suspect the reason it fails on Solaris is that tmp is backed by swap. It might be better if we changed the test to create the file in the current directory (or a sub-directory of). It will be removed by jtreg if the test doesn't delete it. -Alan I've updated the webrev with the temp file created in the current directory. I'm not sure what to do about the case if there is only a little space available only a small file will be used. Should the test fail and force the test operator to create a new test environment where 7G space is available? I lean toward allowing the test to pass using the space that is available.
Re: 7121110 : JAXP 1.4.5 update 1 for 7u4 jaxp source in jdk7 repo
I see in the webrev the removal of drops.dir from the ant build.xml. Should similar updates be applied to README-builds.html for ALT_DROPS_DIR and ALLOW_DOWNLOADS? Might be good to put a note in $(ALT_JDK_DEVTOOLS_PATH)/share/jdk8-drops indicating the files are now include directly in the jdk8 repos. On 12/28/11 12:18 PM, Joe Wang wrote: On 12/28/2011 2:56 AM, Alan Bateman wrote: On 23/12/2011 21:27, Joe Wang wrote: Hi All, We have prepared a jaxp update for 7u4. The listed changes have been posted to the jdk7u-dev alias with 7u4 Request for approval for CR 7121110 - JAXP 1.4.5 update 1 for 7u4. With this update, we are adding jaxp sources back into the jdk7 repository, which means we are abandoning the jaxp bundle download process. This is a significant change. We'd like to hear if you have any concerns. If no objection, we'd like to push the change within the next week. The webrev is here: http://cr.openjdk.java.net/~joehw/jaxp145u1-changeset/webrev/ This is good news. So what about jdk8? I think the protocol with 7u is that changes go into 8 first. Also are you expecting to keep the code in 7u and 8 in sync? Kelly asked for 6-open and jdk8 as well. Since we're changing the jaxp bundle process, we thought we would do the same for 6-open and jdk8 once this works out. But we can take the change to jdk8 first if this is approved. The only question I had was that the jdk8 modularization would change the jaxp source structure. But Kelly thought that's not a problem. I guess we don't mind another big changeset :) -Joe -Alan.
Code Review - CR #7114195 -Short timeouts on regression tests
A number of regression tests have designated short timeouts and have been the cause of intermittent test failures on slow machines or on builds with slower runtime execution speeds, e.g. -Xcomp, etc. The jtreg regression harness uses 2 minutes as the default timeout value for forcing a test failure. http://openjdk.java.net/jtreg/tag-spec.txt Some tests designated less than 120 seconds as a preferred timeout value. This changeset removes timeouts set lower than the default jtreg limit. To accommodate slower configurations the harness is run with a 4x timeoutFactor from the default test makefile. jdk/test/Makefile JTREG_TIMEOUT_OPTION = -timeoutFactor:4 This ensures that slow machines are not terminated prematurely on automated regression test runs. Webrev at: http://cr.openjdk.java.net/~gadams/7114195/
Re: Code Review - 6957683 TEST_BUG: test/java/util/concurrent/ThreadPoolExecutor/Custom.java failing
Revised webrev is available : - removed the latch logic not required for main to worker thread coordination - increased the timeout to allow service shutdown to 120 seconds (will exit sooner if services finish) - increased delay after service shutdown to ensure thread counts will be updated. http://cr.openjdk.java.net/~gadams/6957683/ On 11/21/11 09:33 PM, David Holmes wrote: Gary, On 22/11/2011 6:26 AM, Gary Adams wrote: Not quite as sure about the internal timeouts in this regression test. Here's a proposed change that cranks up the timeouts. Since the test harness defaults to 2 minutes there's no reason the service termination should timeout more quickly. For the thread startup, I added a countdown latch so the main thread waits til the worker threads have started. Also, after the service termination completes increased the delay to ensure the thread count check will be accurate. http://cr.openjdk.java.net/~gadams/6957683/ Sorry but I think my initial analysis of this problem was incorrect. The initial test failure reported 3 failed cases: 1 not equal to 0 11 not equal to 10 1 not equal to 0 which corresponded to failures at the marked lines: public static void main(String[] args) throws Throwable { CustomTPE tpe = new CustomTPE(); equal(tpe.getCorePoolSize(), threadCount); equal(countExecutorThreads(), 0); for (int i = 0; i threadCount; i++) tpe.submit(new Runnable() { public void run() {}}); equal(countExecutorThreads(), threadCount); equal(CustomTask.births.get(), threadCount); tpe.shutdown(); tpe.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS); Thread.sleep(10); /*1*/ equal(countExecutorThreads(), 0); CustomSTPE stpe = new CustomSTPE(); for (int i = 0; i threadCount; i++) stpe.submit(new Runnable() { public void run() {}}); equal(CustomSTPE.decorations.get(), threadCount); /*2*/ equal(countExecutorThreads(), threadCount); stpe.shutdown(); stpe.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS); Thread.sleep(10); /*3/equal(countExecutorThreads(), 0); System.out.printf(%nPassed = %d, failed = %d%n%n, passed, failed); if (failed 0) throw new Exception(Some tests failed); } However, I think the failure at /*2*/ is actually seeing the same extra thread as reported at /*1*/. It could even be the same thread again at /*3*/ but we have other independent failure cases for /*3*/. When we create the STPE each submit will create and start a worker thread until we reach the core pool size. These worker threads will be immediately visible to the countExecutorThreads() logic because it simply enumerates the ThreadGroup. I was mistakenly thinking that these threads need to actually get a chance to execute to become visible to the counting logic, but that is not the case. Consequently you can elide the change to add the latch and just leave the defensive timeouts on the awaitTermination with the increased sleep delays to give the worker threads a chance to really terminate. Sorry for sending you down the wrong path on this one. David -
Code review - 6731620 TEST_BUG: java/util/Timer/Args.java is too optimistic about the execution time of System.out.printf
I think this change is ready to commit, just need a sponsor for the last push (Mandy/David?). http://cr.openjdk.java.net/~gadams/6731620/
Code Review - 6860309 TEST_BUG: Insufficient sleep time in java/lang/Runtime/exec/StreamsSurviveDestroy.java
I think this change is ready to commit, just need a sponsor for the last push (Mandy/David?). http://cr.openjdk.java.net/~gadams/6860309/
Re: Passing time factor to tests run under jtreg
On 11/15/11 8:33 PM, David Holmes wrote: Hi Gary, On 16/11/2011 6:14 AM, Gary Adams wrote: I've been scanning a number of the slow machine test bugs that are reported and wanted to check to see if anyone has looked into time dependencies in the regression tests previously. From what I've been able to learn so far individual bugs can use the timeout parameter to indicate to the test harness an expected time to run. The test harness has command line arguments where it can filter out tests that take too long (timelimit) or can apply a multiplier to to the timeout when conditions are known to slow down the process (timeoutFactor). e.g. 8X for a slow machine or running with -Xcomp I see that there are some wrappers that can be applied around running a particular test to allow processing before main(). Could this mechanism be exploited so the harness command line options could be made known to the time dependent tests as command line arguments or as system properties? My thought is the current timeout granularity is too large and only applies to the full test execution. If a test knew that a timeoutFactor was to be applied, it could internally adjust the time dependent delays appropriately. e.g. not every sleep(), await(), join() with timeouts would need the timeoutFactor applied. I don't quite get what you mean about the timeouts applied to sleeps, awaits etc. Depending on the test some of these are delays (eg sleep is usually used this way) because it may not be feasible (or even possible) to coordinate the threads directly; while others (await, wait etc) are actual timeouts - if they expire it is an error because something appears to have gone wrong somewhere (of course this can be wrong because the timeout was too short for a given situation). The timeout being too short for a slow machine is the specific condition I'd like to address. As many of these tests have evolved along with the testing infrastructure it isn't always very clear who has responsibility for programming defensive timeouts. And many tests are designed to be run stand-alone or under a test harness, where exceptions due to timeouts are preferable to hangs. Some of the bug reports state an intermittent failure. Could be failing due to busy machine, slow machine or aggressive command line options. If a test has an advertized timeout, then it needs to be run stand-alone. If aggressive command line options are used the timefactor multiplier can compensate for the whole test timeout. I believe we can identify slow machines and apply the same timefactor multiplier. Further, while we can add a scale factor for known retarding factors - like Xcomp - there's no general way to assess the target machine capability (# cores, speed) and load, as it may impact a given test. And historically there has been a lack of resources to investigate and solve these issues. I'm not sure why test machine characteristics are not known at test time. Obviously some agent setup has been done to identify chip architecture and operating system. The cpu speed is just another attribute of the test machine. I understand the lack of resources to address this area in the past, but resources are being applied to ensure jdk8 is appropriate for embedded deployments. This seems like another low hanging fruit. e.g. not technically difficult, not particularly controversial Cheers, David Before any test could be updated the information would need to be available from the test context. Any feedback/pointers appreciated! See timeoutFactorArg jtreg/src/share/classes/com/sun/javatest/regtest/Main.java runOtherJVM() jtreg/src/share/classes/com/sun/javatest/regtest/MainAction.java maxTimeoutValue jtreg/src/share/classes/com/sun/javatest/regtest/RegressionParameters.java
Re: Potential timeout issues - tests running in the same vm
On 11/16/11 11:56 AM, Alan Bateman wrote: On 16/11/2011 16:29, Gary Adams wrote: The jtreg tests that use othervm along with a timeout argument should be fairly reliable in getting a consistent result. The tests that did not specify othervm may run into problems, if they are run concurrently with other tests. Here's a quick sampling of the java/util and java/lang tests that might have issues. Would there be any harm in changing these to designate running in othervm mode? With jtreg then tests don't run concurrently in the same VM. By default then each test is run in its own VM, unless you use the -samevm or -agentvm options, in which case the tests run sequentially in the same (or an agent) VM -- ie: othervm is the default. So I don't think there is a problem and we definitely don't want to add /othervm to tests if we can avoid it (as it slows down the test execution). -Alan If we could demonstrate that the tests were not reliable in samevm mode on a slow machine, then there might be more evidence that these tests need to be isolated. e.g. even though the test execution is requested to be in as fast as possible configuration, the test itself may be inherently unreliable when run in that mode.
Re: Potential timeout issues - tests running in the same vm
On 11/16/11 11:59 AM, Kelly O'Hair wrote: On Nov 16, 2011, at 8:29 AM, Gary Adams wrote: The jtreg tests that use othervm along with a timeout argument should be fairly reliable in getting a consistent result. The tests that did not specify othervm may run into problems, if they are run concurrently with other tests. Here's a quick sampling of the java/util and java/lang tests that might have issues. Would there be any harm in changing these to designate running in othervm mode? My general feeling is 'No harm', looking at this list (somewhat short anyway) these tests seem like good candidates to be othervm tests anyway. but that's just my opinion. The list is artificially shortened. I wanted to keep the thread relevant to the core libs fixes in progress for reliable tests. I also wanted to see if there was an overlap with already reported intermittent test failures. -kto find jdk/test/java/{util,lang} -type f |grep -v hg/|xargs grep -n @run |grep timeout |egrep -v othervm|shell jdk/test/java/util/ResourceBundle/Bug4168625Test.java:27:@run main/timeout=600 Bug4168625Test jdk/test/java/util/concurrent/BlockingQueue/CancelledProducerConsumerLoops.java:38: * @run main/timeout=7000 CancelledProducerConsumerLoops jdk/test/java/util/concurrent/BlockingQueue/MultipleProducersSingleConsumerLoops.java:38: * @run main/timeout=3600 MultipleProducersSingleConsumerLoops jdk/test/java/util/concurrent/BlockingQueue/ProducerConsumerLoops.java:38: * @run main/timeout=3600 ProducerConsumerLoops jdk/test/java/util/concurrent/BlockingQueue/SingleProducerMultipleConsumerLoops.java:38: * @run main/timeout=600 SingleProducerMultipleConsumerLoops jdk/test/java/util/concurrent/ConcurrentHashMap/MapCheck.java:38: * @run main/timeout=240 MapCheck jdk/test/java/util/concurrent/ConcurrentHashMap/MapLoops.java:38: * @run main/timeout=1600 MapLoops jdk/test/java/util/concurrent/Exchanger/ExchangeLoops.java:38: * @run main/timeout=720 ExchangeLoops jdk/test/java/util/concurrent/ExecutorCompletionService/ExecutorCompletionServiceLoops.java:38: * @run main/timeout=3600 ExecutorCompletionServiceLoops jdk/test/java/util/concurrent/FutureTask/CancelledFutureLoops.java:38: * @run main/timeout=2000 CancelledFutureLoops jdk/test/java/util/concurrent/locks/ReentrantLock/CancelledLockLoops.java:38: * @run main/timeout=2800 CancelledLockLoops jdk/test/java/util/concurrent/locks/ReentrantLock/LockOncePerThreadLoops.java:38: * @run main/timeout=15000 LockOncePerThreadLoops jdk/test/java/util/concurrent/locks/ReentrantLock/SimpleReentrantLockLoops.java:38: * @run main/timeout=4500 SimpleReentrantLockLoops jdk/test/java/util/concurrent/locks/ReentrantReadWriteLock/MapLoops.java:38: * @run main/timeout=4700 MapLoops jdk/test/java/util/logging/LoggingDeadlock.java:32: * @run main/timeout=15 LoggingDeadlock jdk/test/java/util/logging/LoggingDeadlock2.java:31: * @run main/timeout=15 LoggingDeadlock2 jdk/test/java/util/logging/LoggingDeadlock3.java:30: * @run main/timeout=80 LoggingDeadlock3 jdk/test/java/util/logging/LoggingDeadlock4.java:30: * @run main/timeout=15 LoggingDeadlock4 jdk/test/java/util/zip/ZipFile/ManyEntries.java:27: * @run main/timeout=600 ManyEntries jdk/test/java/lang/annotation/AnnotationTypeMismatchException/FoundType.java:29: * @run main/timeout=30 FoundType jdk/test/java/lang/management/MemoryMXBean/CollectionUsageThreshold.java:34: * @run main/timeout=300 CollectionUsageThreshold
Re: Potential timeout issues - tests running in the same vm
On 11/16/11 12:29 PM, Alan Bateman wrote: On 16/11/2011 17:21, Kelly O'Hair wrote: : But if the test naturally runs long enough, any win from using samevm for that particular testcase is kind of minimal. And if using samevm makes the test less reliable or makes the other tests sharing it's crib less reliable, then I have no issue giving these babies their own cribs. ;^) Sure, but I think we need some evidence that these tests are unreliable first, and if the tests are unreliable then they should be onto the problem list so that they aren't run. -Alan. Agreed! I intend to do a full linux/arm jtreg run with a local beagleBoard cranked down to 300 MHz. Sounds like I should test the default and samevm mode. I'll start with a jdk7 embedded image to get a baseline.
Re: CR 6860309 - solaris timing issue on thread startup
Where can we find the definition of the tag contents? Whichever way this discussion goes, we should update that documentation with the conclusions. On 11/15/11 4:29 PM, David Holmes wrote: Alan, On 15/11/2011 11:26 PM, Alan Bateman wrote: On 15/11/2011 00:56, David Holmes wrote: : 25 * @bug 4820217 6860309 As per other emails and still waiting from confirmation from Alan. I don't think the @bug should be updated for any of these test fixes. The @bug tag is intended to list the bug numbers of bugs that are useful when you have a test failure and sometimes there can be really useful information in a bug that turns out to be a test bug. I'm not aware of any rule that says that the bug numbers shouldn't include bugs that were test only bugs but it's almost always bug numbers of bugs that caused the test to be extended to cover additional scenarios. For the tests being discussed here then I think we should just use our best judgment. Worse case it's easy to dig into the history with hg. That was somewhat non-committal :) To me @bug says these are the bugs that this test is checking the fix for hence not applicable in any of the recent timing/race test fixes. David -Alan.
Re: CR 6860309 - solaris timing issue on thread startup
Updated to move static latch to Copier constructor argument. On 11/14/11 11:09 AM, Gary Adams wrote: I've updated the webrev for CR#6860309 using a CountDownLatch. The main thread will wait til both worker threads are ready to block on the read() before the process is destroyed. http://cr.openjdk.java.net/~gadams/6860309/ Tested with -Xcomp, but still may need a older slow solaris/sparc machine to confirm the fix works for the original bug submitter.
Re: CR 6860309 - solaris timing issue on thread startup
On 11/12/11 6:58 AM, Alan Bateman wrote: On 11/11/2011 16:56, Gary Adams wrote: CR 6860309 - TEST_BUG: Insufficient sleep time in java/lang/Runtime/exec/StreamsSurviveDestroy.java A timing problem is reported for slow solaris systems for this test to start up a process and systematically torture the underlying threads processing data from the running process. On my fast solaris machine I can not reproduce the error, but it is reasonable to assume that on a slower machine there could be scheduling issues that could delay the thread startup past the designated 100 millisecond delay in the main thread. This webrev suggests gating the process destruction until both worker threads are alive. http://cr.openjdk.java.net/~gadams/6860309/ -Xcomp on a slow machine, always fun when testing the untestable. I agree with David but I don't think there is perfect solution. I would suggest using a CountDownLatch or other synchronization so that the main thread waits until the Copier thread is just about to do the read. Then do a sleep in the main thread before invoking the destroy method. I suspect that is the best that you can do as can't be guaranteed that the Copier thread is blocked in the underlying read. -Alan. I'll give it a try on Monday. Ideally, I could get a reproducible failure before trying to fix the problem. Unfortunately, the only slow machines I have on hand are linux/arm, which might have different issues from the solaris/sparc originally reported.
Re: Garbage collection race condition before final checks
On 11/ 8/11 11:13 PM, Mandy Chung wrote: Thanks for picking up this bug and fixing this intermittent issue. PlatformLoggingMXBeanTest.java in the same directory has the same issue. It'd be good to fix that with the same CR. These tests were copied from test/java/util/logging/LoggingMXBeanTest.java. I haven't looked in details but I wonder why the test/java/util/logging tests don't have this intermittent issue and I suspect it holds a strong reference. I attempted to break PlatformLoggingMXBeanTest.java with a generous application of GC calls, but I could not get it to fail in the same manner as the other test. It may be failing due to a different condition.
Re: Garbage collection race condition before final checks
On 11/8/11 11:13 PM, Mandy Chung wrote: Gary, Thanks for picking up this bug and fixing this intermittent issue. PlatformLoggingMXBeanTest.java in the same directory has the same issue. It'd be good to fix that with the same CR. These tests were copied from test/java/util/logging/LoggingMXBeanTest.java. I haven't looked in details but I wonder why the test/java/util/logging tests don't have this intermittent issue and I suspect it holds a strong reference. I'll take a look at the other tests to see if they suffer from the same issue. I was able to force the intermittent bugs to reproduce quickly with a strategic GC in the worst possible place. A couple comment to the fix: On 11/8/2011 7:35 AM, Gary Adams wrote: diff --git a/test/java/lang/management/PlatformLoggingMXBean/LoggingMXBeanTest.java b/test/java/lang/management/PlatformLoggingMXBean/LoggingMXBeanTest.java --- a/test/java/lang/management/PlatformLoggingMXBean/LoggingMXBeanTest.java +++ b/test/java/lang/management/PlatformLoggingMXBean/LoggingMXBeanTest.java Please add the CR number to @bug tag. Will do. @@ -42,6 +42,9 @@ static String LOGGER_NAME_1 = com.sun.management.Logger; static String LOGGER_NAME_2 = com.sun.management.Logger.Logger2; static String UNKNOWN_LOGGER_NAME = com.sun.management.Unknown; +static Logger logger1; +static Logger logger2; It'd be good to add a comment why you want to keep a strong reference to the logger. Minor nit: could make them as instance variables as they are part of the LoggingMXBeanTest instance. I'll give it a try. +static LoggingMXBeanTest testp; public static void main(String[] argv) throws Exception { MBeanServer mbs = ManagementFactory.getPlatformMBeanServer(); @@ -51,7 +54,7 @@ LoggingMXBean.class); // test LoggingMXBean proxy -LoggingMXBeanTest p = new LoggingMXBeanTest(proxy); +testp = new LoggingMXBeanTest(proxy); Could we make checkAttributes as an instance method to LoggingMXBeanTest object so that you don't need the static variable? My preference is to leave a test as intact as possible from how the original test was written. e.g. do the minimum required to make the test reliable. These regression tests are meant to confirm that a particular issue does not slip back into the code base. Changing checkAttributes to be an instance method is not essential to getting the test to work properly. Better to get 10 intermittent tests to work reliably rather than get 1 test polished to perfection. $.02 // check if the attributes implemented by PlatformLoggingMXBean // and LoggingMXBean return the same value @@ -59,14 +62,18 @@ ManagementFactory.getPlatformMXBean(mbs, PlatformLoggingMXBean.class); checkAttributes(proxy, mxbean); + +logger1 = null; +logger2 = null; +testp = null; With the above suggested change, I think these 3 lines are not needed. OK } // same verification as in java/util/logging/LoggingMXBeanTest2 public LoggingMXBeanTest(LoggingMXBean mbean) throws Exception { -Logger logger1 = Logger.getLogger( LOGGER_NAME_1 ); +logger1 = Logger.getLogger( LOGGER_NAME_1 ); logger1.setLevel(Level.FINE); -Logger logger2 = Logger.getLogger( LOGGER_NAME_2 ); +logger2 = Logger.getLogger( LOGGER_NAME_2 ); logger2.setLevel(null); /* @@ -207,6 +214,18 @@ // verify logger names ListString loggers1 = mxbean1.getLoggerNames(); ListString loggers2 = mxbean2.getLoggerNames(); + +// Print the two logger lists for diagnostic purposes +System.out.println( : LoggingMXBean + loggers1); +System.out.println( : PlatformLoggingMXBean + loggers2); + +// Print the list of logger level comparisons for diagnostic purposes +for (String logger : loggers1) { +System.out.println (: Level ( + logger ++ , + mxbean1.getLoggerLevel(logger) ++ , + mxbean2.getLoggerLevel(logger) + )); +} It might be good to keep these diagnostic message to be printed only when it throws an exception? Extra diagnostic information is good but just want to keep the volume of traces reasonable not to make the output harder to use. We can probably delete the diagnostic output, now that the problem is clearly identified. Thanks again for fixing it. Alan pointed out a pile of teststabilization bugs that have been tagged. I'll see if any of those were suffering from the same issue found here. Mandy
Re: Race condition in ThreadGroup stop test
Captured the latest round of comments - more readable initialization - allow sleep interruption to terminate main thread - added current CR# to @bug tag 24/** 25 * @test 26 * @bug 4176355 7084033 27 * @summary Stopping a ThreadGroup that contains the current thread has 28 * unpredictable results. 29 */ 30 31public class Stop implements Runnable { 32private static boolean groupStopped = false ; 33private static final Object lock = new Object(); 34 35private static final ThreadGroup group = new ThreadGroup(); 36private static final Thread first = new Thread(group, new Stop()); 37private static final Thread second = new Thread(group, new Stop()); 38 39public void run() { 40while (true) { 41// Give the other thread a chance to start 42try { 43Thread.sleep(1000); 44} catch (InterruptedException e) { 45} 46 47// When the first thread runs, it will stop the group. 48if (Thread.currentThread() == first) { 49synchronized (lock) { 50try { 51group.stop(); 52} finally { 53// Signal the main thread it is time to check 54// that the stopped thread group was successful 55groupStopped = true; 56lock.notifyAll(); 57} 58} 59} 60} 61} 62 63public static void main(String[] args) throws Exception { 64// Launch two threads as part of the same thread group 65second.start(); 66first.start(); 67 68// Wait for the thread group stop to be issued 69synchronized(lock){ 70while (!groupStopped) { 71lock.wait(); 72// Give the other thread a chance to stop 73Thread.sleep(1000); 74} 75} 76 77// Check that the second thread is terminated when the 78// first thread terminates the thread group. 79boolean failed = second.isAlive(); 80 81// Clean up any threads that may have not been terminated 82first.stop(); 83second.stop(); 84if (failed) 85throw new RuntimeException(Failure.); 86} 87}
Re: Timing bugs
Here's an update diff for the elapsed time check. - added current CR# to @bug tag - moved capture of start time to after creation of the latches so only the schedule*() and the await() calls are included in the elapsed time check. jdk/test/java/util/Timer/Args.java /* * @test - * @bug 6571655 6571881 6574585 6571297 + * @bug 6571655 6571881 6574585 6571297 6731620 * @summary Test various args to task scheduling methods */ @@ -92,19 +92,21 @@ new F(){void f(){ t.scheduleAtFixedRate(x, (Date)null, 42); }} ); -final long start = System.currentTimeMillis(); -final Date past = new Date(start - 10500); final CountDownLatch y1 = new CountDownLatch(1); final CountDownLatch y2 = new CountDownLatch(1); final CountDownLatch y3 = new CountDownLatch(11); +final long start = System.currentTimeMillis(); +final Date past = new Date(start - 10500); +final long elapsed; schedule( t, counter(y1), past); schedule( t, counter(y2), past, 1000); scheduleAtFixedRate(t, counter(y3), past, 1000); y3.await(); y1.await(); y2.await(); -System.out.printf(elapsed=%d%n, System.currentTimeMillis() - start); -check(System.currentTimeMillis() - start 500); +elapsed = System.currentTimeMillis() - start; +System.out.printf(elapsed=%d%n, elapsed); +check(elapsed 500); t.cancel(); On 11/ 4/11 09:36 AM, Gary Adams wrote: I've started to look at timing related bugs that have been open for a while, but have not had sufficient priority to make it to the top of the list of bugs to be fixed. Thought I'd start with some low hanging fruit with simple bug fixes. 6731620: TEST_BUG: java/util/Timer/Args.java is too optimistic about the execution time of System.out.printf This seems like a simply problem to avoid two calls to get the current time and to eliminated the time to process the print statement from the evaluation of the test elapsed time. Replacing this sequence ; System.out.printf(elapsed=%d%n, System.currentTimeMillis() - start); check(System.currentTimeMillis() - start 500); with elapsed = System.currentTimeMillis() - start; System.out.printf(elapsed=%d%n, elapsed); check(elapsed 500); I plan to test the fix on a 300MHz linux/arm device. I'll provide a proper webrev as soon as I have author rights confirmed. I'm looking for reviewer and a committer, once I get the fix tested locally. Thanks Gary Adams
Re: Garbage collection race condition before final checks
Here's an updated diff : - added current CR# to the @bug tag - made logger1 and logger2 instance variables - renamed test instance variable lmxbeantest - removed excessive diagnostic print outs --- a/test/java/lang/management/PlatformLoggingMXBean/LoggingMXBeanTest.java +++ b/test/java/lang/management/PlatformLoggingMXBean/LoggingMXBeanTest.java @@ -23,7 +23,7 @@ /* * @test - * @bug 7024172 + * @bug 7024172 7067691 * @summary Test if proxy for PlatformLoggingMXBean is equivalent * to proxy for LoggingMXBean * @@ -43,6 +43,13 @@ static String LOGGER_NAME_2 = com.sun.management.Logger.Logger2; static String UNKNOWN_LOGGER_NAME = com.sun.management.Unknown; +// These instance variables prevent premature logger garbage collection +// See getLogger() weak reference warnings. +Logger logger1; +Logger logger2; + +static LoggingMXBeanTest lmxbeantest; + public static void main(String[] argv) throws Exception { MBeanServer mbs = ManagementFactory.getPlatformMBeanServer(); LoggingMXBean proxy = @@ -51,7 +58,7 @@ LoggingMXBean.class); // test LoggingMXBean proxy -LoggingMXBeanTest p = new LoggingMXBeanTest(proxy); +lmxbeantest = new LoggingMXBeanTest(proxy); // check if the attributes implemented by PlatformLoggingMXBean // and LoggingMXBean return the same value @@ -59,14 +66,16 @@ ManagementFactory.getPlatformMXBean(mbs, PlatformLoggingMXBean.class); checkAttributes(proxy, mxbean); + +lmxbeantest = null; } // same verification as in java/util/logging/LoggingMXBeanTest2 public LoggingMXBeanTest(LoggingMXBean mbean) throws Exception { -Logger logger1 = Logger.getLogger( LOGGER_NAME_1 ); +logger1 = Logger.getLogger( LOGGER_NAME_1 ); logger1.setLevel(Level.FINE); -Logger logger2 = Logger.getLogger( LOGGER_NAME_2 ); +logger2 = Logger.getLogger( LOGGER_NAME_2 ); logger2.setLevel(null); /* @@ -207,6 +216,7 @@ // verify logger names ListString loggers1 = mxbean1.getLoggerNames(); ListString loggers2 = mxbean2.getLoggerNames(); + if (loggers1.size() != loggers2.size()) throw new RuntimeException(LoggerNames: unmatched number of entries); ListString loggers3 = new ArrayList(loggers1); @@ -219,7 +229,10 @@ if (!mxbean1.getLoggerLevel(logger) .equals(mxbean2.getLoggerLevel(logger))) throw new RuntimeException( -LoggerLevel: unmatched level for + logger); +LoggerLevel: unmatched level for ( + logger ++ , + mxbean1.getLoggerLevel(logger) ++ , + mxbean2.getLoggerLevel(logger) + )); + if (!mxbean1.getParentLoggerName(logger) .equals(mxbean2.getParentLoggerName(logger))) throw new RuntimeException( On 11/ 9/11 04:08 AM, Alan Bateman wrote: On 08/11/2011 15:35, Gary Adams wrote: Here's another intermittent bug that is attributed to the garbage collection of the loggers before the final static check can be applied in the test. CR#7067691 : java/lang/management/PlatformLoggingMXBean/LoggingMXBeanTest.java failing intermittently I agree with Mandy that it would be great to do a quick audit of the other logging (and PlatformLoggingMXBean) tests to see if we have similar issues. I should explain that one reason why these test failures may not have been observed in the past is because most people ran jtreg in its default mode (called othervm mode, where each tests runs in its own VM). Nowadays we run the tests via the make file or use jtreg -agentvm so that tests are running sequentially in an agent VM. Not all areas can run in agent VM mode yet but for the areas that do then we get a good speed up as the startup cost is eliminated and also it's possible to have a pool of agent VMs to make use of all cores when doing test runs (-agentvm -concurrency:auto for example). However agent VM makes things less predictable and for these tests it means that the GC will be unpredictable which is why this test was failing only very intermittently. Anyway, the changes look fine to me. I agree with Mandy that many logger1 and logger2 could be instance variables. I would suggest proxy or something more readable rather than testp for the LoggingMXBean proxy. I also agree with Mandy's comment about adding the @bug. -Alan.
Re: Race condition in TimerTask KillThread test
Here's a revised diff for the KillThread timing problem : - added current CR# to @bug tag - capture the thread from the timer task - wait for the timertask thread to be visible to the main thread then join the thread before fall through to attempt the second timertask schedule call. diff --git a/test/java/util/Timer/KillThread.java b/test/java/util/Timer/KillThread.java --- a/test/java/util/Timer/KillThread.java +++ b/test/java/util/Timer/KillThread.java @@ -23,7 +23,7 @@ /* * @test - * @bug 4288198 + * @bug 4288198 6818464 * @summary Killing a Timer thread causes the Timer to fail silently on * subsequent use. */ @@ -31,21 +31,27 @@ import java.util.*; public class KillThread { +static Thread tdThread; public static void main (String[] args) throws Exception { +tdThread = null; Timer t = new Timer(); // Start a mean event that kills the timer thread t.schedule(new TimerTask() { public void run() { +tdThread = Thread.currentThread(); throw new ThreadDeath(); } }, 0); // Wait for mean event to do the deed and thread to die. try { +do { Thread.sleep(100); +} while(tdThread == null); } catch(InterruptedException e) { } +tdThread.join(); // Try to start another event try { On 11/ 6/11 08:46 PM, David Holmes wrote: On 5/11/2011 8:37 AM, Alan Bateman wrote: On 04/11/2011 15:52, Gary Adams wrote: : I'll attempt a simple fix for the KillThread case by replacing : Thread.sleep(100); with a simple loop do { Thread.sleep(100); } while (waiting); where the TimerTask runnable will clear the flag with waiting = false once it has been launched. I don't think this will guarantee that the timer thread will have terminated before the main thread schedules the second event. I think this test would be robust if you capture a reference to the timer thread in the first task and have the main thread wait or poll until it has a reference to the thread. Once it has a reference to the thread then it can wait for it to terminate before attempting to schedule the second task. Agreed. As soon as waiting is set we can switch back to the main thread and proceed to the next phase of the test - but the timer thread is still alive at that point. If the issue is checking for thread termination then we must track that directly. David - -Alan.
Re: Race condition in ThreadGroup stop test
Let's see if this captures all the comments and is a bit more robust in the face of the original race conditions mentioned. - threads are started before they are recorded in local variables - second is now volatile - stop thread group is triggered by first thread once second thread is started - main thread checks for second thread death after thread group is stopped - left in the sleep delays to yield to other threads - using wait/notify for handshake with main thread - using try/finally to ensure main thread does not check too soon after thread group stop 31public class Stop implements Runnable { 32private static Thread first=null; 33private static volatile Thread second=null; 34private static ThreadGroup group = new ThreadGroup(); 35private static boolean groupStopped =false ; 36private static final Object lock = new Object(); 37 38Stop() { 39Thread thread = new Thread(group, this); 40thread.start(); 41// Record the threads after they have been started 42if (first == null) 43first = thread; 44else 45second = thread; 46} 47 48public void run() { 49while (true) { 50try { 51// Give the other thread a chance to start 52Thread.sleep(1000); 53} catch(InterruptedException e) { 54} 55 56// When the first thread sees the second thread has been started 57// stop the thread group. 58if ((Thread.currentThread() == first) 59 (second != null)) { 60synchronized (lock) { 61try { 62group.stop(); 63} finally { 64// Signal the main thread it is time to check 65// that the stopped thread group was successful 66groupStopped = true; 67lock.notifyAll(); 68} 69} 70} 71} 72} 73 74public static void main(String[] args) throws Exception { 75// Launch two threads as part of the same thread group 76for (int i = 0; i 2; i++) 77new Stop(); 78 79// Wait for the thread group to be issued 80synchronized(lock){ 81while (!groupStopped){ 82lock.wait(); 83try { 84// Give the other thread a chance to stop 85Thread.sleep(1000); 86} catch(InterruptedException e) { 87} 88} 89} 90 91// Check that the second thread is terminated when the 92// first thread terminates the thread group. 93boolean failed = second.isAlive(); 94 95// Clean up any threads that may have not been terminated 96first.stop(); 97second.stop(); 98if (failed) 99throw new RuntimeException(Failure.); 100} 101}
Re: Timing bugs
On 11/ 6/11 08:05 PM, David Holmes wrote: Hi Gary, On 4/11/2011 11:36 PM, Gary Adams wrote: I've started to look at timing related bugs that have been open for a while, but have not had sufficient priority to make it to the top of the list of bugs to be fixed. Thought I'd start with some low hanging fruit with simple bug fixes. Sometimes apparently low-hanging fruit is attached to the tree with steel wire ;-) (and sometimes barbed wire at that) Had a feeling when I pulled on the fruit the trap door would open in the floor ... 6731620: TEST_BUG: java/util/Timer/Args.java is too optimistic about the execution time of System.out.printf This seems like a simply problem to avoid two calls to get the current time and to eliminated the time to process the print statement from the evaluation of the test elapsed time. Replacing this sequence ; System.out.printf(elapsed=%d%n, System.currentTimeMillis() - start); check(System.currentTimeMillis() - start 500); with elapsed = System.currentTimeMillis() - start; System.out.printf(elapsed=%d%n, elapsed); check(elapsed 500); This seems reasonable on the face of it. But a couple of other observations regarding the test code: First the 500 is somewhat arbitrary - the test is preparing three timertasks with initial expiry times in the past and expecting them to all execute immediately (one multiple times). This immediacy is translated into completes within 500ms. These type of timing constants should be eradicated if possible, or preferably made configurable if not. It is a difficult problem to address if you want to try and detect something taking an unexpectedly long time, but have no idea what your execution environment will be. I think I'm signing up for making this one test reliable and not taking on all of the bad timeout assumptions. The 500 millisecond constraint seems reasonable in this case for the work expected to be completed. Second, the current code won't detect a true failure that results in a hang - the testcase will hang and presumably get timed-out by the test harness. It might be better to apply a timeout to the await() calls instead and fail only if they do timeout. I thought the hung test was the responsibility of the test harness. It makes for simpler individual tests if that responsibility is delegated to the harness. Third, in this case it would also be prudent to (re-)read the start time after the setup has been done ie after creating the countDownLatches. If this is run in samevm mode or agentvm mode those constructions may trigger a full GC and the ensuing delay could cause the test to again fail. Sounds reasonable I'll add that change to the fix. That all said, I'd see combining your suggested fix with moving the point at which start is measured as definite improvements in the test. Agreed. I plan to test the fix on a 300MHz linux/arm device. I'll provide a proper webrev as soon as I have author rights confirmed. I'm looking for reviewer and a committer, once I get the fix tested locally. I presume by committer you mean somebody to actually do the push for you - I think we generally refer to that as a sponsor (even though not part of OpenJDK terminology). In any case that should be someone from TL group and I'll let them respond further on that and provide any additional review comments. I'm happy to act as another Reviewer. Thanks. Cheers, David Thanks Gary Adams
Race condition in ThreadGroup stop test
Here's another test with race conditions built into the test that can be easily avoided CR#7084033 - TEST_BUG: test/java/lang/ThreadGroup/Stop.java fails intermittently There are at least two race conditions in the test as currently written. The test uses sleeps as a way to yield time to other threads to complete their tasks, but this may not be sufficient on slower machines. The first race condition is in the starting of the test threads. To ensure both threads are started, the run check for the first thread should also check that the second thread is not null. The second race condition in the main thread presumes that the group stop has been issued within 3000 milliseconds. A simple loop on the delay can be gated by a flag set after the group stop has been issued. diff --git a/test/java/lang/ThreadGroup/Stop.java b/test/java/lang/ThreadGroup/Stop.java --- a/test/java/lang/ThreadGroup/Stop.java +++ b/test/java/lang/ThreadGroup/Stop.java @@ -32,6 +32,7 @@ private static Thread first=null; private static Thread second=null; private static ThreadGroup group = new ThreadGroup(); +private static boolean groupStopped =false ; Stop() { Thread thread = new Thread(group, this); @@ -47,8 +48,11 @@ while (true) { try { Thread.sleep(1000); // Give other thread a chance to start -if (Thread.currentThread() == first) +if ((Thread.currentThread() == first) + (second != null)) { + groupStopped = true; group.stop(); + } } catch(InterruptedException e){ } } @@ -57,7 +61,10 @@ public static void main(String[] args) throws Exception { for (int i=0; i2; i++) new Stop(); + do { Thread.sleep(3000); + } while (!groupStopped) ; + boolean failed = second.isAlive(); first.stop(); second.stop(); if (failed)
Timing bugs
I've started to look at timing related bugs that have been open for a while, but have not had sufficient priority to make it to the top of the list of bugs to be fixed. Thought I'd start with some low hanging fruit with simple bug fixes. 6731620: TEST_BUG: java/util/Timer/Args.java is too optimistic about the execution time of System.out.printf This seems like a simply problem to avoid two calls to get the current time and to eliminated the time to process the print statement from the evaluation of the test elapsed time. Replacing this sequence ; System.out.printf(elapsed=%d%n, System.currentTimeMillis() - start); check(System.currentTimeMillis() - start 500); with elapsed = System.currentTimeMillis() - start; System.out.printf(elapsed=%d%n, elapsed); check(elapsed 500); I plan to test the fix on a 300MHz linux/arm device. I'll provide a proper webrev as soon as I have author rights confirmed. I'm looking for reviewer and a committer, once I get the fix tested locally. Thanks Gary Adams
Race condition in TimerTask KillThread test
I'm taking a look at some older timing based bugs that may cause problems on slower machines 6818464: TEST_BUG: Timeout values in several java/util tests are insufficient I'd like to split this bug into two, based on the example problems that are mentioned in the bug report. The first example in java/util/Timer/KillThread.java is a legitimate race condition. The code only will work correctly if the processing for the TimerTask schedules and fires the runnable within the hard coded 100 milliseconds. This can be fixed with a local variable to synchronize when the the second operation can be attempted. (Hard coded sleep times are rarely correct when dealing with slower devices.) In the second example the test instructions present a timeout to be enforced by the outer test harness. e.g. @run main/timeout=20 StoreDeadLock This type of test limit is easily addressed on slower machines using the test harness timefactor to scale the acceptable test run time. I'll attempt a simple fix for the KillThread case by replacing : Thread.sleep(100); with a simple loop do { Thread.sleep(100); } while (waiting); where the TimerTask runnable will clear the flag with waiting = false once it has been launched.