Thanks Paul!

Yes, I like your solution better.
I'll test it and send the updated webrev shortly.

Sincerely yours,
Ivan


On 14.10.2013 15:33, Paul Sandoz wrote:
On Oct 14, 2013, at 11:36 AM, Ivan Gerasimov <ivan.gerasi...@oracle.com> wrote:

Hi Paul,

Thank you for looking at the webrev!

On 14.10.2013 12:50, Paul Sandoz wrote:
Hi Ivan,

Why do you require closeLock? is it not sufficient leave processExited as is 
and just add:

public synchronized void close() throws IOException {
     super.close();
}
I want to limit the impact on the code, that's why I synchronize close() with 
only processExited().
If I made close() synchronized with all the stream's methods, then I think it 
could introduce some unpredictable drawback.

Yeah, the close will block until the drain is complete, which might also have 
some unpredictable drawback.

A concern is there was already some code in place to check for async close, 
which if detected throws away all the data and sets the underlying filtered 
input stream is set to null. So all that effort draining is wasted.

Another solution is to keep checking in the loop of drainInputStream for an 
async close and fail-fast:

         private InputStream drainInputStream(InputStream in)
                 throws IOException {
             int n = 0;
             int j;
             byte[] a = null;
             while ((j = in.available()) > 0) {
                 if (buf == null) // asynchronous close()?
                     return null; // stop reading and discard
                 a = (a == null) ? new byte[j] : Arrays.copyOf(a, n + j);
                 n += in.read(a, n, j);
             }

             if (buf == null) // asynchronous close()?
                 return null; // discard
             else if (a == null)
                 return ProcessBuilder.NullInputStream.INSTANCE;
             else
                 return ByteArrayInputStream(n == a.length ? a : 
Arrays.copyOf(a, n));
         }

         /** Called by the process reaper thread when the process exits. */
         synchronized void processExited() {
             // Most BufferedInputStream methods are synchronized, but close()
             // is not, and so we have to handle concurrent racing close().
             try {
                 InputStream in = this.in;
                 if (in != null) {
                     InputStream stragglers = drainInputStream(in);
                     in.close();
                     this.in = stragglers;
                 }
             } catch (IOException ignored) {
                 // probably an asynchronous close().
             }
         }

This still has the potential to look at the availability of the current input 
before async closing, and read from a different input after async closing, but 
it will break out on the next loop check if there are more bytes available to 
read otherwise it will be caught again on the final check, so no munged data 
will be exposed.



Plus there was already some logic checking if close was asynchronously called 
(see BufferedInputStream.close):

  372                         if (buf == null) // asynchronous close()?
  373                             this.in = null;

and this would no longer be required
Yes, you're right and these two lines can be removed.
I wouldn't remove the surrounding try {} catch (IOException) though, as the 
stream can still be closed asynchronously.

I am somewhat struggling to understand the test (not too familiar with this 
area). Requires a more expert eye in this area than I.

IIUC you are trying to induce the the output process fd of ExecLoop to "flip" 
to another fd opened by OpenLoop.
Yes.
I actually adopted a testcase provided by the reporter of the problem.
I only changed the test in such a way that it can be run in an automated mode.

Below is the exact comment from the original testcase.
It may be a good idea to include it in the regtest.

/* Following code is a test case that demonstrates a close race in 
ProcessPipeInputStream.
*.
* A snippet of the offending code within ProcessPipeInputStream:.
*       // Called by the process reaper thread when the process exits..
*       synchronized void processExited() {
*          // Most BufferedInputStream methods are synchronized, but close()
*           // is not, and so we have to handle concurrent racing close().
*           try {
*               InputStream in = this.in;
*               if (in != null) {
*                   byte[] stragglers = drainInputStream(in);
*                   in.close();
*
* The comment even talks about the race. However, the code is not adequately
* protecting against it. drainInputStream() (available() in particular) is not
* threadsafe when a close happens in parallel. What's worse is that a subsequent
* open() is likely to reassign the closed fd so drainInputStream ends up 
draining
* the wrong fd! This can cause OOMs, data corruption on other fds, and other 
very
* bad things.
*.
* Notes:.
*  - Test creates a large sparse file in /tmp/bigsparsefile
*  - Test uses "/proc/meminfo" for simple input..
*  - Invoke using "-Xmx128m" to insure file size of 1G is greater than heap size
*  - Test will run forever, if successful there will not be any OOM exceptions
*  (on my test system they occur about 1 per minute)
*  - You need multiple cores (I use 2 x Xeon E5620 = 8 cores, 16 threads).
*..
* The basic premise of the test is to create a 1GB sparse file. Then have lots
* of threads opening and closing this file. In a loop within a separate thread 
use UNIXprocess
* to run a simple command. We read the output of the command, make sure the 
process
* has exited and then close the inputstream. When the bug happens, 
processExited()
* will use FIONREAD to figure out how much data to drain BUT the fd it is 
checking
* will be closed and reopened (now referring  to /tmp/bigsparsefile) so it will
* see the size as 1GB and then attempt to allocate a buffer of that size and.
* then OOM..
*/



  The test concerns me since it is gonna use up loads of resources and proceed 
for 10 minutes until deciding beyond reasonable doubt that there is no issue.
When I ran the test across different machines (with the unpatched jdk) it 
reported from one to five races during 10 minutes.
We can surely decrease the timeout for the price of higher probability of a 
false positive result.

Unit-like tests should really run in under a minute, plus should ideally be 
frugal in their resource consumption otherwise it could starve other tests 
running in parallel. If we cannot create a more deterministic test (not sure it 
is possible) then perhaps this test belongs to SQE and not in the JDK tests?

Paul.

Reply via email to