I understand, better the devil you know argument, if someone wants to
make a 2.2.1 release, I'm certainly not against it.
It won't take me long to track down & fix the concurrency bug, if that's
the cause.
I tend to agree with Greg's assessment that the tests are contrived.
The tests make a poor attempt at determining whether the leased resource
is still available by checking within a time window around lease expiry,
it's subject to so many assumptions that I can't see why that
determination is even made.
I don't think we can guarantee a lease will expire during a specific
time window, most programmers will hopefully renew a lease long before
it's due to expire. Until we make that assessment, I think we'll
struggle with the tests.
Remember were now dealing with concurrency over network connections.
The so called failures may just be due to timing.
The network is asynchronous.
The old code appears to slow down network connections enough to make
them less asynchronous, heck even really bad concurrency race conditions
and obvious bugs didn't manifest during testing.
Example of how old code makes network less asynchronous:
1. Thread runs, opens network connection, security check is performed.
2. Second thread runs, opens network connection, security check is
performed, first thread is synchronized on a DNS check, the second
thread waits.
3. Second thread will complete after first.
New code:
1. Thread runs, opens network connection, security check is performed
and returns without contention.
2. Second thread runs opens network connection, security check is
performed and returns without contention.
3. Second thread by chance finishes communication prior to first.
If we look at ClassLoader, the story is similar.
Regards,
Peter.
Dan Creswell wrote:
I feel Bryan has a point.
Putting out a release with just the Levels fix is better than the
current release which won't run at all under various circumstances.
i.e. It is a small step forward and in no way a backward step.
Such a release mightn't be as much of a step forward as one also
featuring the concurrency work *but* the concurrency work is also
seemingly amplifying the occurrence of preexisting concurrency
problems. It can be said then to perform both better and worse than
previous releases.
In essence, the small fix release is a known quantity, the new one
less so. We maybe ought to give users the choice as to how much
unknown'ness they wish to indulge in.
On 1 April 2013 13:10, Peter <[email protected]
<mailto:[email protected]>> wrote:
The big problem is there are less concurrency bugs in the present
branch and there are definitely more in the previous release, they
just haven't presented in test in previous releases, although they
have in deployment.
The issue appears to be with javaspaces / lease / transactions.
It's also a possibility that there's just a timing issue with the
tests.
I'd like to fix the tests before releasing, it's unfortunate the
last fix to jeri seemed to expose these new test failures. It's a
little like fixing rust in an old car, the more you remove, the
more you find.
What I really need is more help, I don't work on this full time it
takes longer than a team woking on commercial software.
Peter.
----- Original message -----
> I am uncomfortable with a release that has known concurrency
problems. I
> am also uncomfortable that the custom Levels serialization change by
> Oracle [1] has broken the current release for new JVMs. I would
like to
> see a minor release which fixes that serialization problem and a
candidate
> release which gives people a chance to discovery concurrent
issues without
> risking a release that is known to be unstable.
>
> I'm happy to review a few of the classes with known concurrency
problems
> to see if I can help nail some of these bugs. Since I do not
know the
> river internals, I would only be able to spot concurrency
problems that
> exist within a class. I am not in a good position to comment on
> concurrency problems that might arise through the interactions among
> classes.
>
> Bryan
>
> [1] https://issues.apache.org/jira/browse/RIVER-416
>
> On 4/1/13 6:14 AM, "Peter Firmstone" <[email protected]
<mailto:[email protected]>> wrote:
>
> > The attachments will be removed from the list, so I've cc'd
you, anyone
> > who's interested, let me know I can forward the attachments.
They can
> > be opened with jvisualvm.
> >
> > The profiling isn't perfect, the test runs for about 8.5
minutes, so
> > hotspot should have kicked in relatively early in both test runs.
> >
> > I guess a significant problem is; the more I remove performance
> > impedances, like unnecessary DNS calls, the faster
multithreading and
> > context switching gets.
> >
> > Not only did the old policy providers create contention, but
it was
> > slower for single threaded performance (I'll have to run the
previous
> > release branch for comparison when I get some time).
> >
> > The URIGrant.implies call is now down to .228 ms per
invocation, down
> > from 1.68 ms per invocation this week, which was already quite
good (on
> > old UltraSparcII hardware), during stress tests this method is
called
> > almost 40,000 times.
> >
> > In comparison the old policy provider which required a DNS
call (every
> > time CodeSource.implies is called, functionality now replaced by
> > URIGrant.implies), the old policy provider also cached all
Permission's
> > in highly contended PermissionCollection's, which during
network calls
> > invoked SocketPermission.implies, possibly for every
SocketPermission in
> > the PermissionCollection, DNS is also consulted by
> > SocketPermission.implies, while synchronized, ouch!
> >
> > URIGrant.implies is non blocking, that's right zero contention.
> > SocketPermission.implies DNS calls can be avoided in most cases if
> > PermissionComparator finds an exact match or wild card.
> >
> > This is without the CombinerSecurityManager, which improves
security
> > performance by a factor of 10 (SocketPermission is only
checked once for
> > each AccessControlContext).
> >
> > PreferredClassProvider no longer creates unecessary DNS calls, and
> > neither does SecureClassLoader, URLClassLoader or
PreferredClassLoader.
> >
> > So the good news is the next release will feel much faster,
the bad news
> > is that existing concurrency bugs that previously didn't
appear during
> > test runs, but likely to manifest during production are now
occurring
> > during testing. There's more good news, we fixed a number of
> > concurrency bugs since the last release too, I guess I have to
draw the
> > line somewhere and cut a release.
> >
> > DNS calls haven't been completely eliminated as some are still
> > necessary, but a multitude of unnecessary DNS calls have been
eliminated.
> >
> > Regards,
> >
> > Peter.
> >
> > Peter Firmstone wrote:
> > > They've passed more consistently in the past, they're either
> > > concurrency bugs or network timing related, it would be nice
to at
> > > least determine if it's the former or latter.
> > >
> > > Cheers,
> > >
> > > Peter.
> > >
> > > Tom Hobbs wrote:
> > > > Are these all new failures, i.e. were they working
before? Or are they
> > > > "new" failures in that the test categories have only
recently been
> > > > reactivated and the failures discovered?
> > > >
> > > > Are they a big enough blocker to stop a release?
> > > >
> > > >
> > > > On Sun, Mar 31, 2013 at 11:31 PM, Peter Firmstone
<[email protected] <mailto:[email protected]>>
> > > > wrote:
> > > >
> > > >
> > > > > The following test fails 30 times in a run of 130 tests:
> > > > >
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
0
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
0
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
0
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
0
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
0
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
0
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
0
> > > > > [java] # of tests passed =
0
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
0
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
0
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
0
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
0
> > > > > [java] # of tests passed =
0
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
0
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
0
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
0
> > > > > [java] # of tests passed =
0
> > > > > [java] # of tests passed =
0
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
0
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
0
> > > > > [java] # of tests passed =
0
> > > > > [java] # of tests passed =
0
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
0
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
0
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
0
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
0
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
0
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
0
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
0
> > > > > [java] # of tests passed =
1
> > > > > [java] # of tests passed =
0
> > > > > [java] # of tests passed =
1
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > 100
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > Buildfile: build.xml
> > > > >
> > > > > qa.run-tests:
> > > > >
> > > > > james-brown:
> > > > > [delete] Deleting directory /opt/src/River_Fixed/**
> > > > > peterConcurrentPolicy/qa/soul
> > > > > [mkdir] Created dir: /opt/src/River_Fixed/**
> > > > > peterConcurrentPolicy/qa/soul
> > > > > [touch] Creating
> > > > > /opt/src/River_Fixed/**peterConcurrentPolicy/qa/soul/*
> > > > > *soul.201303312239034808
> > > > >
> > > > > run-tests:
> > > > > [java]
> > > > > [java] ------------------------------**-----------
> > > > > [java] CONFIGURATION FILE:
> > > > > [java]
> > > > > [java]
/opt/src/River_Fixed/**peterConcurrentPolicy/qa/src/**
> > > > > com/sun/jini/test/resources/**qaHarness.prop
> > > > > [java]
> > > > > [java] ------------------------------**-----------
> > > > > [java] SETTING UP THE TEST LIST:
> > > > > [java]
> > > > > [java] Adding test:
> > > > > com/sun/jini/test/spec/**javaspace/conformance/
> > > > > **snapshot/**SnapshotExpirationNotifyTest.**td
> > > > > [java]
> > > > > [java] ------------------------------**-----------
> > > > > [java] GENERAL HARNESS CONFIGURATION INFORMATION:
> > > > > [java]
> > > > > [java] Date started:
> > > > > [java] Sun Mar 31 22:39:37 EST 2013
> > > > > [java] Installation directory of the JSK:
> > > > > [java]
com.sun.jini.jsk.home=/opt/**src/River_Fixed/**
> > > > > peterConcurrentPolicy
> > > > > [java] Installation directory of the harness:
> > > > > [java]
com.sun.jini.qa.home=/opt/src/**River_Fixed/**
> > > > > peterConcurrentPolicy/qa
> > > > > [java] Categories being tested:
> > > > > [java] categories=No Categories
> > > > > [java] ------------------------------**-----------
> > > > > [java] ENVIRONMENT PROPERTIES:
> > > > > [java]
> > > > > [java] JVM information:
> > > > > [java] Java HotSpot(TM) Server VM,
20.5-b03, 32 bit VM mode
> > > > > [java] Sun Microsystems Inc.
> > > > > [java] OS information:
> > > > > [java] SunOS, 5.10, sparc
> > > > > [java]
> > > > > [java] ------------------------------**-----------
> > > > > [java] STARTING TO RUN THE TESTS
> > > > > [java]
> > > > > [java]
> > > > > [java] Running
com/sun/jini/test/spec/**javaspace/conformance/**
> > > > > snapshot/**SnapshotExpirationNotifyTest.**td
> > > > > [java] Time is Sun Mar 31 22:39:38 EST 2013
> > > > > [java] Starting test in separate process with
command:
> > > > > [java] /usr/jdk/jdk1.6.0_30/jre/bin/**java
> > > > >
> > > > >
-Djava.security.manager=org.**apache.river.api.security.**CombinerSecur
> > > > > ityManager
> > > > >
> > > > > -Djava.security.policy=file:/**opt/src/River_Fixed/**
> > > > >
peterConcurrentPolicy/qa/**harness/policy/defaulttest.**policy
> > > > >
> > > > >
-Djava.rmi.server.codebase=htt**p://bluto:9082/qa1-javaspace-**dl.jar<h
> > > > > ttp://bluto:9082/qa1-javaspace-dl.jar>-cp
> > > > > /opt/src/River_Fixed/
> > > > >
> > > > >
**peterConcurrentPolicy/qa/lib/**jiniharness.jar:/opt/src/**River_Fixed
> > > > > /**
> > > > >
> > > > >
peterConcurrentPolicy/qa/lib/**jinitests.jar:/opt/src/River_**
> > > > >
> > > > >
Fixed/peterConcurrentPolicy/**lib/jsk-platform.jar:/opt/src/**River_Fix
> > > > > ed/
> > > > >
> > > > >
**peterConcurrentPolicy/lib/jsk-**lib.jar:/opt/src/River_Fixed/**
> > > > >
> > > > >
peterConcurrentPolicy/lib/**high-scale-lib.jar:/opt/src/**River_Fixed/*
> > > > > *
> > > > >
> > > > > peterConcurrentPolicy/lib/**custard-apple-1.0.2.jar -ea
-esa -client
> > > > >
-Djava.ext.dirs=/usr/jdk/jdk1.**6.0_30/jre/lib/ext:/usr/jdk/**
> > > > >
> > > > >
packages/lib/ext:/opt/src/**River_Fixed/**peterConcurrentPolicy/qa/lib-
> > > > > **
> > > > >
> > > > > ext:/opt/src/River_Fixed/**peterConcurrentPolicy/lib-ext
> > > > > -Dcom.sun.jini.jsk.port=9080 -Dcom.sun.jini.qa.port=9081
> > > > >
-Dcom.sun.jini.jsk.home=/opt/**src/River_Fixed/**peterConcurrentPolicy
> > > > >
> > > > >
-Dcom.sun.jini.qa.home=/opt/**src/River_Fixed/**peterConcurrentPolicy/q
> > > > > a
> > > > >
> > > > > -Dcom.sun.jini.qa.harness.**harnessJar=/opt/src/River_**
> > > > > Fixed/peterConcurrentPolicy/**qa/lib/jiniharness.jar
> > > > > -Dcom.sun.jini.qa.harness.**testJar=/opt/src/River_Fixed/**
> > > > > peterConcurrentPolicy/qa/lib/**jinitests.jar
> > > > > -Dcom.sun.jini.qa.harness.**runjiniserver=true
> > > > > -Dcom.sun.jini.qa.harness.**runkitserver=true
> > > > > -Djava.security.properties=*
> > > > > *file:/opt/src/River_Fixed/**peterConcurrentPolicy/qa/**
> > > > > harness/trust/dynamic-policy.**properties
> > > > > -Dcom.sun.jini.qa.harness.**testhosts=
> > > > >
-Djava.util.logging.config.**file=/home/peter/logging.**properties
> > > > >
> > > > >
-Dcom.sun.jini.test.home=/opt/**src/River_Fixed/**peterConcurrentPolicy
> > > > > /qa
> > > > >
> > > > > -Dcom.sun.jini.test.port=9082 -Dcom.sun.jini.qa.harness.**
> > > > >
policies=file:/opt/src/River_**Fixed/peterConcurrentPolicy/**
> > > > > qa/src/com/sun/jini/test/**resources/jinitest.policy
> > > > >
-Djava.ext.dirs=/usr/jdk/jdk1.**6.0_30/jre/lib/ext:/usr/jdk/**
> > > > >
> > > > >
packages/lib/ext:/opt/src/**River_Fixed/**peterConcurrentPolicy/qa/lib-
> > > > > **
> > > > >
> > > > > ext:/opt/src/River_Fixed/**peterConcurrentPolicy/lib-ext
> > > > > com.sun.jini.qa.harness.**MasterTest
com/sun/jini/test/spec/**
> > > > >
javaspace/conformance/**snapshot/**SnapshotExpirationNotifyTest.**td
> > > > > [java] com.sun.jini.qa.harness.**TestException:
Not all
> > > > > listeners've
> > > > > got expected number of events.
> > > > > [java] at
com.sun.jini.test.spec.**javaspace.conformance.**
> > > > > snapshot.**SnapshotExpirationNotifyTest.**run(**
> > > > > SnapshotExpirationNotifyTest.**java:370)
> > > > > [java] at
> > > > > com.sun.jini.qa.harness.**MasterTest.doTest(MasterTest.*
> > > > > *java:256)
> > > > > [java] at
> > > > > com.sun.jini.qa.harness.**MasterTest.main(MasterTest.**
> > > > > java:144)
> > > > > [java]
> > > > > [java] TIME: 10:42:54 PM
> > > > > [java]
> > > > > [java] Test process was destroyed and returned code 1
> > > > > [java]
> > > > > com/sun/jini/test/spec/**javaspace/conformance/**snapshot/**
> > > > > SnapshotExpirationNotifyTest.**td
> > > > > [java] Test Failed: Test Failed:
> > > > > com.sun.jini.qa.harness.**TestException:
> > > > > Not all listeners've got expected number of events.
> > > > > [java]
> > > > > [java]
> > > > > [java] ------------------------------**-----------
> > > > > [java]
> > > > > [java] SUMMARY ==============================**===
> > > > > [java]
> > > > > [java]
> > > > > com/sun/jini/test/spec/**javaspace/conformance/**snapshot/**
> > > > > SnapshotExpirationNotifyTest.**td
> > > > > [java] Test Failed: Test Failed:
> > > > > com.sun.jini.qa.harness.**TestException:
> > > > > Not all listeners've got expected number of events.
> > > > > [java]
> > > > > [java] ------------------------------**-----------
> > > > > [java]
> > > > > [java] # of tests started = 1
> > > > > [java] # of tests completed = 1
> > > > > [java] # of tests passed = 0
> > > > > [java] # of tests failed = 1
> > > > > [java]
> > > > > [java] ------------------------------**-----------
> > > > > [java]
> > > > > [java] Date finished:
> > > > > [java] Sun Mar 31 22:42:59 EST 2013
> > > > > [java] Time elapsed:
> > > > > [java] 201 seconds
> > > > > [java]
> > > > > [java] Java Result: 1
> > > > >
> > > > > collect-result:
> > > > >
> > > > > BUILD FAILED
> > > > >
/opt/src/River_Fixed/**peterConcurrentPolicy/build.**xml:2105: The
> > > > > following error occurred while executing this line:
> > > > >
/opt/src/River_Fixed/**peterConcurrentPolicy/qa/**build.xml:357:
> > > > > condition satisfied
> > > > >
> > > > > Total time: 3 minutes 30 seconds
> > > > >
> > > > >
> > > > >
> > > >
> > > >
> > >
> > >
> >
>