Here's the updated webrev with the solution by Paul:
http://cr.openjdk.java.net/~igerasim/8024521/1/webrev/

I've also excluded the test from it.
Instead, I'm going to write a manual testing instruction for SQE.

I've tested the fix and it works as expected.

Sincerely yours,
Ivan


On 15.10.2013 10:29, Martin Buchholz wrote:
Hi

I'm the author of this code.
I'm perfectly willing to believe the close/drain code is buggy.
It is difficult to get right.
I'm looking forward to the updated webrev.

Often the best way to handle a stress test is to make it run quickly by default, but provide a system property to make it run as long as desired, for manual ad-hoc testing.


On Mon, Oct 14, 2013 at 9:04 AM, Ivan Gerasimov <ivan.gerasi...@oracle.com <mailto:ivan.gerasi...@oracle.com>> wrote:

    Thanks Paul!

    Yes, I like your solution better.
    I'll test it and send the updated webrev shortly.

    Sincerely yours,
    Ivan



    On 14.10.2013 15:33, Paul Sandoz wrote:

        On Oct 14, 2013, at 11:36 AM, Ivan Gerasimov
        <ivan.gerasi...@oracle.com <mailto:ivan.gerasi...@oracle.com>>
        wrote:

            Hi Paul,

            Thank you for looking at the webrev!

            On 14.10.2013 12:50, Paul Sandoz wrote:

                Hi Ivan,

                Why do you require closeLock? is it not sufficient
                leave processExited as is and just add:

                public synchronized void close() throws IOException {
                     super.close();
                }

            I want to limit the impact on the code, that's why I
            synchronize close() with only processExited().
            If I made close() synchronized with all the stream's
            methods, then I think it could introduce some
            unpredictable drawback.

        Yeah, the close will block until the drain is complete, which
        might also have some unpredictable drawback.

        A concern is there was already some code in place to check for
        async close, which if detected throws away all the data and
        sets the underlying filtered input stream is set to null. So
        all that effort draining is wasted.

        Another solution is to keep checking in the loop of
        drainInputStream for an async close and fail-fast:

                 private InputStream drainInputStream(InputStream in)
                         throws IOException {
                     int n = 0;
                     int j;
                     byte[] a = null;
                     while ((j = in.available()) > 0) {
                         if (buf == null) // asynchronous close()?
                             return null; // stop reading and discard
                         a = (a == null) ? new byte[j] :
        Arrays.copyOf(a, n + j);
                         n += in.read(a, n, j);
                     }

                     if (buf == null) // asynchronous close()?
                         return null; // discard
                     else if (a == null)
                         return ProcessBuilder.NullInputStream.INSTANCE;
                     else
                         return ByteArrayInputStream(n == a.length ? a
        : Arrays.copyOf(a, n));
                 }

                 /** Called by the process reaper thread when the
        process exits. */
                 synchronized void processExited() {
                     // Most BufferedInputStream methods are
        synchronized, but close()
                     // is not, and so we have to handle concurrent
        racing close().
                     try {
                         InputStream in = this.in <http://this.in>;
                         if (in != null) {
                             InputStream stragglers =
        drainInputStream(in);
                             in.close();
        this.in <http://this.in> = stragglers;
                         }
                     } catch (IOException ignored) {
                         // probably an asynchronous close().
                     }
                 }

        This still has the potential to look at the availability of
        the current input before async closing, and read from a
        different input after async closing, but it will break out on
        the next loop check if there are more bytes available to read
        otherwise it will be caught again on the final check, so no
        munged data will be exposed.



                Plus there was already some logic checking if close
                was asynchronously called (see BufferedInputStream.close):

                  372                         if (buf == null) //
                asynchronous close()?
                  373 this.in <http://this.in> = null;

                and this would no longer be required

            Yes, you're right and these two lines can be removed.
            I wouldn't remove the surrounding try {} catch
            (IOException) though, as the stream can still be closed
            asynchronously.

                I am somewhat struggling to understand the test (not
                too familiar with this area). Requires a more expert
                eye in this area than I.

                IIUC you are trying to induce the the output process
                fd of ExecLoop to "flip" to another fd opened by OpenLoop.

            Yes.
            I actually adopted a testcase provided by the reporter of
            the problem.
            I only changed the test in such a way that it can be run
            in an automated mode.

            Below is the exact comment from the original testcase.
            It may be a good idea to include it in the regtest.

            /* Following code is a test case that demonstrates a close
            race in ProcessPipeInputStream.
            *.
            * A snippet of the offending code within
            ProcessPipeInputStream:.
            *       // Called by the process reaper thread when the
            process exits..
            *       synchronized void processExited() {
            *          // Most BufferedInputStream methods are
            synchronized, but close()
            *           // is not, and so we have to handle concurrent
            racing close().
            *           try {
            *               InputStream in = this.in <http://this.in>;
            *               if (in != null) {
            *                   byte[] stragglers = drainInputStream(in);
            *                   in.close();
            *
            * The comment even talks about the race. However, the code
            is not adequately
            * protecting against it. drainInputStream() (available()
            in particular) is not
            * threadsafe when a close happens in parallel. What's
            worse is that a subsequent
            * open() is likely to reassign the closed fd so
            drainInputStream ends up draining
            * the wrong fd! This can cause OOMs, data corruption on
            other fds, and other very
            * bad things.
            *.
            * Notes:.
            *  - Test creates a large sparse file in /tmp/bigsparsefile
            *  - Test uses "/proc/meminfo" for simple input..
            *  - Invoke using "-Xmx128m" to insure file size of 1G is
            greater than heap size
            *  - Test will run forever, if successful there will not
            be any OOM exceptions
            *  (on my test system they occur about 1 per minute)
            *  - You need multiple cores (I use 2 x Xeon E5620 = 8
            cores, 16 threads).
            *..
            * The basic premise of the test is to create a 1GB sparse
            file. Then have lots
            * of threads opening and closing this file. In a loop
            within a separate thread use UNIXprocess
            * to run a simple command. We read the output of the
            command, make sure the process
            * has exited and then close the inputstream. When the bug
            happens, processExited()
            * will use FIONREAD to figure out how much data to drain
            BUT the fd it is checking
            * will be closed and reopened (now referring  to
            /tmp/bigsparsefile) so it will
            * see the size as 1GB and then attempt to allocate a
            buffer of that size and.
            * then OOM..
            */



                  The test concerns me since it is gonna use up loads
                of resources and proceed for 10 minutes until deciding
                beyond reasonable doubt that there is no issue.

            When I ran the test across different machines (with the
            unpatched jdk) it reported from one to five races during
            10 minutes.
            We can surely decrease the timeout for the price of higher
            probability of a false positive result.

        Unit-like tests should really run in under a minute, plus
        should ideally be frugal in their resource consumption
        otherwise it could starve other tests running in parallel. If
        we cannot create a more deterministic test (not sure it is
        possible) then perhaps this test belongs to SQE and not in the
        JDK tests?

        Paul.




Reply via email to