I understand, better the devil you know argument, if someone wants to make a 2.2.1 release, I'm certainly not against it.

It won't take me long to track down & fix the concurrency bug, if that's the cause. I tend to agree with Greg's assessment that the tests are contrived. The tests make a poor attempt at determining whether the leased resource is still available by checking within a time window around lease expiry, it's subject to so many assumptions that I can't see why that determination is even made. I don't think we can guarantee a lease will expire during a specific time window, most programmers will hopefully renew a lease long before it's due to expire. Until we make that assessment, I think we'll struggle with the tests.

Remember were now dealing with concurrency over network connections. The so called failures may just be due to timing.

The network is asynchronous.

The old code appears to slow down network connections enough to make them less asynchronous, heck even really bad concurrency race conditions and obvious bugs didn't manifest during testing.

Example of how old code makes network less asynchronous:

  1. Thread runs, opens network connection, security check is performed.
  2. Second thread runs, opens network connection, security check is
     performed, first thread is synchronized on a DNS check, the second
     thread waits.
  3. Second thread will complete after first.

New code:

  1. Thread runs, opens network connection, security check is performed
     and returns without contention.
  2. Second thread runs opens network connection, security check is
     performed and returns without contention.
  3. Second thread by chance finishes communication prior to first.

If we look at ClassLoader, the story is similar.

Regards,

Peter.

Dan Creswell wrote:
I feel Bryan has a point.

Putting out a release with just the Levels fix is better than the current release which won't run at all under various circumstances. i.e. It is a small step forward and in no way a backward step.

Such a release mightn't be as much of a step forward as one also featuring the concurrency work *but* the concurrency work is also seemingly amplifying the occurrence of preexisting concurrency problems. It can be said then to perform both better and worse than previous releases.

In essence, the small fix release is a known quantity, the new one less so. We maybe ought to give users the choice as to how much unknown'ness they wish to indulge in.




On 1 April 2013 13:10, Peter <[email protected] <mailto:[email protected]>> wrote:

    The big problem is there are less concurrency bugs in the present
    branch and there are definitely more in the previous release, they
    just haven't presented in test in previous releases, although they
    have in deployment.

    The issue appears to be with javaspaces / lease / transactions.

    It's also a possibility that there's just a timing issue with the
    tests.

    I'd like to fix the tests before releasing, it's unfortunate the
    last fix to jeri seemed to expose these new test failures.  It's a
    little like fixing rust in an old car, the more you remove, the
    more you find.

    What I really need is more help, I don't work on this full time it
    takes longer than a team woking on commercial software.

    Peter.
    ----- Original message -----
    > I am uncomfortable with a release that has known concurrency
    problems.  I
    > am also uncomfortable that the custom Levels serialization change by
    > Oracle [1] has broken the current release for new JVMs.  I would
    like to
    > see a minor release which fixes that serialization problem and a
    candidate
    > release which gives people a chance to discovery concurrent
    issues without
    > risking a release that is known to be unstable.
    >
    > I'm happy to review a few of the classes with known concurrency
    problems
    > to see if I can help nail some of these bugs.  Since I do not
    know the
    > river internals, I would only be able to spot concurrency
    problems that
    > exist within a class.  I am not in a good position to comment on
    > concurrency problems that might arise through the interactions among
    > classes.
    >
    > Bryan
    >
    > [1] https://issues.apache.org/jira/browse/RIVER-416
    >
    > On 4/1/13 6:14 AM, "Peter Firmstone" <[email protected]
    <mailto:[email protected]>> wrote:
    >
    > > The attachments will be removed from the list, so I've cc'd
    you, anyone
> > who's interested, let me know I can forward the attachments. They can
    > > be opened with jvisualvm.
    > >
    > > The profiling isn't perfect, the test runs for about 8.5
    minutes, so
    > > hotspot should have kicked in relatively early in both test runs.
    > >
    > > I guess a significant problem is; the more I remove performance
    > > impedances, like unnecessary DNS calls, the faster
    multithreading and
    > > context switching gets.
    > >
    > > Not only did the old policy providers create contention, but
    it was
    > > slower for single threaded performance (I'll have to run the
    previous
    > > release branch for comparison when I get some time).
    > >
    > > The URIGrant.implies call is now down to .228 ms per
    invocation, down
    > > from 1.68 ms per invocation this week, which was already quite
    good (on
    > > old UltraSparcII hardware), during stress tests this method is
    called
    > > almost 40,000 times.
    > >
    > > In comparison the old policy provider which required a DNS
    call (every
    > > time CodeSource.implies is called, functionality now replaced by
    > > URIGrant.implies), the old policy provider also cached all
    Permission's
    > > in highly contended PermissionCollection's, which during
    network calls
    > > invoked SocketPermission.implies, possibly for every
    SocketPermission in
    > > the PermissionCollection, DNS is also consulted by
    > > SocketPermission.implies, while synchronized, ouch!
    > >
    > > URIGrant.implies is non blocking, that's right zero contention.
    > > SocketPermission.implies DNS calls can be avoided in most cases if
    > > PermissionComparator finds an exact match or wild card.
    > >
    > > This is without the CombinerSecurityManager, which improves
    security
    > > performance by a factor of 10 (SocketPermission is only
    checked once for
    > > each AccessControlContext).
    > >
    > > PreferredClassProvider no longer creates unecessary DNS calls, and
    > > neither does SecureClassLoader,  URLClassLoader or
    PreferredClassLoader.
    > >
    > > So the good news is the next release will feel much faster,
    the bad news
    > > is that existing concurrency bugs that previously didn't
    appear during
    > > test runs, but likely to manifest during production are now
    occurring
    > > during testing.  There's more good news, we fixed a number of
    > > concurrency bugs since the last release too, I guess I have to
    draw the
    > > line somewhere and cut a release.
    > >
    > > DNS calls haven't been completely eliminated as some are still
    > > necessary, but a multitude of unnecessary DNS calls have been
    eliminated.
    > >
    > > Regards,
    > >
    > > Peter.
    > >
    > > Peter Firmstone wrote:
    > > > They've passed more consistently in the past, they're either
    > > > concurrency bugs or network timing related, it would be nice
    to at
    > > > least determine if it's the former or latter.
    > > >
    > > > Cheers,
    > > >
    > > > Peter.
    > > >
    > > > Tom Hobbs wrote:
    > > > > Are these all new failures, i.e. were they working
    before?  Or are they
    > > > > "new" failures in that the test categories have only
    recently been
    > > > > reactivated and the failures discovered?
    > > > >
    > > > > Are they a big enough blocker to stop a release?
    > > > >
    > > > >
    > > > > On Sun, Mar 31, 2013 at 11:31 PM, Peter Firmstone
    <[email protected] <mailto:[email protected]>>
    > > > > wrote:
    > > > >
    > > > >
    > > > > > The following test fails 30 times in a run of 130 tests:
    > > > > >
> > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 0 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 0 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 0 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 0 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 0 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 0 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 0 > > > > > [java] # of tests passed = 0 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 0 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 0 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 0 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 0 > > > > > [java] # of tests passed = 0 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 0 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 0 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 0 > > > > > [java] # of tests passed = 0 > > > > > [java] # of tests passed = 0 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 0 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 0 > > > > > [java] # of tests passed = 0 > > > > > [java] # of tests passed = 0 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 0 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 0 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 0 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 0 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 0 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 0 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 0 > > > > > [java] # of tests passed = 1 > > > > > [java] # of tests passed = 0 > > > > > [java] # of tests passed = 1
    > > > > >
    > > > > >
    > > > > >
    > > > > >
    > > > > >
    > > > > >
    > > > > >                100
    > > > > >
    > > > > >
    > > > > >
    > > > > >
    > > > > > Buildfile: build.xml
    > > > > >
    > > > > > qa.run-tests:
    > > > > >
    > > > > > james-brown:
    > > > > >    [delete] Deleting directory /opt/src/River_Fixed/**
    > > > > > peterConcurrentPolicy/qa/soul
    > > > > >      [mkdir] Created dir: /opt/src/River_Fixed/**
    > > > > > peterConcurrentPolicy/qa/soul
    > > > > >      [touch] Creating
    > > > > > /opt/src/River_Fixed/**peterConcurrentPolicy/qa/soul/*
    > > > > > *soul.201303312239034808
    > > > > >
    > > > > > run-tests:
    > > > > >        [java]
    > > > > >        [java] ------------------------------**-----------
    > > > > >        [java] CONFIGURATION FILE:
    > > > > >        [java]
> > > > > [java] /opt/src/River_Fixed/**peterConcurrentPolicy/qa/src/**
    > > > > > com/sun/jini/test/resources/**qaHarness.prop
    > > > > >        [java]
    > > > > >        [java] ------------------------------**-----------
    > > > > >        [java] SETTING UP THE TEST LIST:
    > > > > >        [java]
    > > > > >        [java]      Adding test:
    > > > > > com/sun/jini/test/spec/**javaspace/conformance/
    > > > > > **snapshot/**SnapshotExpirationNotifyTest.**td
    > > > > >        [java]
    > > > > >        [java] ------------------------------**-----------
    > > > > >        [java] GENERAL HARNESS CONFIGURATION INFORMATION:
    > > > > >        [java]
    > > > > >        [java]      Date started:
    > > > > >        [java]            Sun Mar 31 22:39:37 EST 2013
    > > > > >        [java]      Installation directory of the JSK:
> > > > > [java] com.sun.jini.jsk.home=/opt/**src/River_Fixed/**
    > > > > > peterConcurrentPolicy
    > > > > >        [java]      Installation directory of the harness:
> > > > > [java] com.sun.jini.qa.home=/opt/src/**River_Fixed/**
    > > > > > peterConcurrentPolicy/qa
    > > > > >        [java]      Categories being tested:
    > > > > >        [java]            categories=No Categories
    > > > > >        [java] ------------------------------**-----------
    > > > > >        [java] ENVIRONMENT PROPERTIES:
    > > > > >        [java]
    > > > > >        [java]      JVM information:
    > > > > >        [java]            Java HotSpot(TM) Server VM,
    20.5-b03, 32 bit VM mode
    > > > > >        [java]            Sun Microsystems Inc.
    > > > > >        [java]      OS information:
    > > > > >        [java]            SunOS, 5.10, sparc
    > > > > >        [java]
    > > > > >        [java] ------------------------------**-----------
    > > > > >        [java] STARTING TO RUN THE TESTS
    > > > > >        [java]
    > > > > >        [java]
    > > > > >        [java] Running
    com/sun/jini/test/spec/**javaspace/conformance/**
    > > > > > snapshot/**SnapshotExpirationNotifyTest.**td
    > > > > >        [java] Time is Sun Mar 31 22:39:38 EST 2013
    > > > > >        [java] Starting test in separate process with
    command:
    > > > > >        [java] /usr/jdk/jdk1.6.0_30/jre/bin/**java
    > > > > >
    > > > > >
    -Djava.security.manager=org.**apache.river.api.security.**CombinerSecur
    > > > > > ityManager
    > > > > >
    > > > > > -Djava.security.policy=file:/**opt/src/River_Fixed/**
    > > > > >
    peterConcurrentPolicy/qa/**harness/policy/defaulttest.**policy
    > > > > >
    > > > > >
    -Djava.rmi.server.codebase=htt**p://bluto:9082/qa1-javaspace-**dl.jar<h
    > > > > > ttp://bluto:9082/qa1-javaspace-dl.jar>-cp
    > > > > > /opt/src/River_Fixed/
    > > > > >
    > > > > >
    **peterConcurrentPolicy/qa/lib/**jiniharness.jar:/opt/src/**River_Fixed
    > > > > > /**
    > > > > >
    > > > > >
    peterConcurrentPolicy/qa/lib/**jinitests.jar:/opt/src/River_**
    > > > > >
    > > > > >
    Fixed/peterConcurrentPolicy/**lib/jsk-platform.jar:/opt/src/**River_Fix
    > > > > > ed/
    > > > > >
    > > > > >
    **peterConcurrentPolicy/lib/jsk-**lib.jar:/opt/src/River_Fixed/**
    > > > > >
    > > > > >
    peterConcurrentPolicy/lib/**high-scale-lib.jar:/opt/src/**River_Fixed/*
    > > > > > *
    > > > > >
    > > > > > peterConcurrentPolicy/lib/**custard-apple-1.0.2.jar -ea
    -esa -client
    > > > > >
    -Djava.ext.dirs=/usr/jdk/jdk1.**6.0_30/jre/lib/ext:/usr/jdk/**
    > > > > >
    > > > > >
    packages/lib/ext:/opt/src/**River_Fixed/**peterConcurrentPolicy/qa/lib-
    > > > > > **
    > > > > >
    > > > > > ext:/opt/src/River_Fixed/**peterConcurrentPolicy/lib-ext
    > > > > > -Dcom.sun.jini.jsk.port=9080 -Dcom.sun.jini.qa.port=9081
    > > > > >
    -Dcom.sun.jini.jsk.home=/opt/**src/River_Fixed/**peterConcurrentPolicy
    > > > > >
    > > > > >
    -Dcom.sun.jini.qa.home=/opt/**src/River_Fixed/**peterConcurrentPolicy/q
    > > > > > a
    > > > > >
    > > > > > -Dcom.sun.jini.qa.harness.**harnessJar=/opt/src/River_**
    > > > > > Fixed/peterConcurrentPolicy/**qa/lib/jiniharness.jar
    > > > > > -Dcom.sun.jini.qa.harness.**testJar=/opt/src/River_Fixed/**
    > > > > > peterConcurrentPolicy/qa/lib/**jinitests.jar
    > > > > > -Dcom.sun.jini.qa.harness.**runjiniserver=true
    > > > > > -Dcom.sun.jini.qa.harness.**runkitserver=true
    > > > > > -Djava.security.properties=*
    > > > > > *file:/opt/src/River_Fixed/**peterConcurrentPolicy/qa/**
    > > > > > harness/trust/dynamic-policy.**properties
    > > > > > -Dcom.sun.jini.qa.harness.**testhosts=
    > > > > >
    -Djava.util.logging.config.**file=/home/peter/logging.**properties
    > > > > >
    > > > > >
    -Dcom.sun.jini.test.home=/opt/**src/River_Fixed/**peterConcurrentPolicy
    > > > > > /qa
    > > > > >
    > > > > > -Dcom.sun.jini.test.port=9082 -Dcom.sun.jini.qa.harness.**
    > > > > >
    policies=file:/opt/src/River_**Fixed/peterConcurrentPolicy/**
    > > > > > qa/src/com/sun/jini/test/**resources/jinitest.policy
    > > > > >
    -Djava.ext.dirs=/usr/jdk/jdk1.**6.0_30/jre/lib/ext:/usr/jdk/**
    > > > > >
    > > > > >
    packages/lib/ext:/opt/src/**River_Fixed/**peterConcurrentPolicy/qa/lib-
    > > > > > **
    > > > > >
    > > > > > ext:/opt/src/River_Fixed/**peterConcurrentPolicy/lib-ext
    > > > > > com.sun.jini.qa.harness.**MasterTest
    com/sun/jini/test/spec/**
    > > > > >
    javaspace/conformance/**snapshot/**SnapshotExpirationNotifyTest.**td
    > > > > >        [java] com.sun.jini.qa.harness.**TestException:
    Not all
    > > > > > listeners've
    > > > > > got expected number of events.
    > > > > >        [java]        at
    com.sun.jini.test.spec.**javaspace.conformance.**
    > > > > > snapshot.**SnapshotExpirationNotifyTest.**run(**
    > > > > > SnapshotExpirationNotifyTest.**java:370)
    > > > > >        [java]        at
    > > > > > com.sun.jini.qa.harness.**MasterTest.doTest(MasterTest.*
    > > > > > *java:256)
    > > > > >        [java]        at
    > > > > > com.sun.jini.qa.harness.**MasterTest.main(MasterTest.**
    > > > > > java:144)
    > > > > >        [java]
    > > > > >        [java] TIME: 10:42:54 PM
    > > > > >        [java]
    > > > > >        [java] Test process was destroyed and returned code 1
    > > > > >        [java]
    > > > > > com/sun/jini/test/spec/**javaspace/conformance/**snapshot/**
    > > > > > SnapshotExpirationNotifyTest.**td
    > > > > >        [java] Test Failed: Test Failed:
    > > > > > com.sun.jini.qa.harness.**TestException:
    > > > > > Not all listeners've got expected number of events.
    > > > > >        [java]
    > > > > >        [java]
    > > > > >        [java] ------------------------------**-----------
    > > > > >        [java]
    > > > > >        [java] SUMMARY ==============================**===
    > > > > >        [java]
    > > > > >        [java]
    > > > > > com/sun/jini/test/spec/**javaspace/conformance/**snapshot/**
    > > > > > SnapshotExpirationNotifyTest.**td
    > > > > >        [java] Test Failed: Test Failed:
    > > > > > com.sun.jini.qa.harness.**TestException:
    > > > > > Not all listeners've got expected number of events.
    > > > > >        [java]
    > > > > >        [java] ------------------------------**-----------
    > > > > >        [java]
    > > > > >        [java] # of tests started    = 1
    > > > > >        [java] # of tests completed = 1
    > > > > >        [java] # of tests passed      = 0
    > > > > >        [java] # of tests failed      = 1
    > > > > >        [java]
    > > > > >        [java] ------------------------------**-----------
    > > > > >        [java]
    > > > > >        [java]      Date finished:
    > > > > >        [java]            Sun Mar 31 22:42:59 EST 2013
    > > > > >        [java]      Time elapsed:
    > > > > >        [java]            201 seconds
    > > > > >        [java]
    > > > > >        [java] Java Result: 1
    > > > > >
    > > > > > collect-result:
    > > > > >
    > > > > > BUILD FAILED
    > > > > >
    /opt/src/River_Fixed/**peterConcurrentPolicy/build.**xml:2105: The
    > > > > > following error occurred while executing this line:
    > > > > >
    /opt/src/River_Fixed/**peterConcurrentPolicy/qa/**build.xml:357:
    > > > > > condition satisfied
    > > > > >
    > > > > > Total time: 3 minutes 30 seconds
    > > > > >
    > > > > >
    > > > > >
    > > > >
    > > > >
    > > >
    > > >
    > >
    >



Reply via email to