Re: RFR 8021820: Number of opened files used in select() is limited to 1024 [macosx]

Stuart Marks Mon, 05 Aug 2013 18:10:33 -0700

Hi Aleksej,

Thanks for the update. I took a look at the revised test, and there are stillsome issues. (I didn't look at the build changes.)


1) System-specific resource limits.

I think the biggest issue is resource limits on the number of open files perprocess that might vary from system to system. On my Ubuntu system, the hardlimit on the number of open files is 1,024. The test opens 1,023 files and thenone more for the socket. Unfortunately the JVM and jtreg have several filesopen already, and the test crashes before the openFiles() method completes.

(Oddly it crashes with a NoClassDefFoundError from the main thread's uncaughtexception handler, and then the test reports that it passed! Placing atry/catch of Throwable in main() or openFiles() doesn't catch this error. Ihave no explanation for this. When run standalone -- i.e., not from jtreg --the test throws FileNotFoundException (too many open files) from openFiles(),which is expected.)

On my Mac (10.7.5) the soft limit is 256 files, but the hard limit isunlimited. The test succeeds in opening all its files but fails because of theselect() bug you're fixing. (This is expected; I didn't rebuild my JDK withyour patch.) I guess the soft limit doesn't do anything on Mac.


Amazingly, the test passed fine on both Windows XP and Windows 8.

I'm not entirely sure what to do about resource limits. Since the test is ableto open >1024 files on Mac, Windows, and possibly other Linuxes, it seemsreasonable to continue with this approach. If it's possible to catch the errorthat occurs if the test cannot open its initial 1,024 files, perhaps it shoulddo this, log a message indicating what happened, and consider the test to havepassed. I'm mystified by the uncaught/uncatchable NoClassDefFoundError though.


2) Closing files.

If an exception is thrown while opening the initial set of files, or sometimeduring the closing process, the test can still leak files.

One approach would be to keep a data structure representing the current set ofopen files, and close them all in a finally-block around all the test logic,and making sure that exceptions from the close() call are caught and do notprevent the rest of the files from being closed.

This seems like a lot of work. Perhaps a more effective approach would be torun the test in "othervm" mode, as follows:


    @run main/othervm SelectFdsLimit

This will cause the test to run in a dedicated JVM, so all files will be closedautomatically when it exits. (It would be good to add a comment explaining theneed for othervm, if you do this.)


3) Port number for sockets.

It's fairly common for tests to fail occasionally because they use someconstant port number that sometimes happens to be in use at the same time byanother process on the system. I have to say, 8080 is a pretty common portnumber. :-)


For purposes of this test, you can let the system assign a port. Just use:

    new ServerSocket(0)

4) Style.

It's probably possible to use the same File object for the test file, insteadof creating new File objects repeatedly.

It might be nice to add a comment explaining the logic of the test, thatSocketTimeoutException is expected, and that failure will be indicated if theaccept() throws SocketException caused by the underlying mishandling of largefds by select().


Thanks,

s'marks



On 8/5/13 4:47 AM, Aleksej Efimov wrote:

Alan, Tim,

I have addressed your comments and as a result - new webrev:
http://cr.openjdk.java.net/~aefimov/8021820/webrev.01

The list of changes:
1. The connection to Oracle site is removed (it's not internal, but anyway it's
better not to rely on availability of external resource in test). In current
version a server socket is created and accept() method is used for bug 
disclosure.
2. The cleanup method is added for closing file streams. The JTREG successfully
cleaned-up on windows after this modification.
3. common/autoconf/toolchain.m4 untouched, but 'bash
common/autoconf/autogen.sh' was executed to update generated-configure.sh.

Aleksej


On 07/31/2013 06:35 PM, Tim Bell wrote:

Aleksej, Alan

The change in common/autoconf/toolchain.m4 looks correct to me, and I think
that is a good place to have it.  Remember to run 'bash
common/autoconf/autogen.sh' and check in the generated-configure.sh files as
part of the changeset.

I didn't look at the test case, but I think Alan has some good points.

Tim

On 07/31/13 06:45 AM, Alan Bateman wrote:

On 31/07/2013 05:18, Aleksej Efimov wrote:

Hi,
Can I have a review for the following problem:
The MACOSX JDK (more precisely - the java.net classes) uses the select()
system call to wait for different events on sockets fds. And the default
behaviour for select() on Darwin is to fail when fdset contains the fd with
id greater than FDSET_SIZE(=1024). Test case in webrev illustrates this
behavior.
There is at least one solution for it: use -D_DARWIN_UNLIMITED_SELECT
compilation flag for all macosx sources: this won't affect other parts of
JDK because they are not using select().
Currently, I have added this compilation flag to
common/autoconf/generated-configure.sh and
common/autoconf/generated-configure.sh. I wonder, if there is a better
place where I can put this flag?

The webrev: http://cr.openjdk.java.net/~aefimov/8021820/webrev.00/
BUG: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8021820


Thanks for looking into this one. The build changes look okay to me but it's
probably best that someone on build-dev agree to those.
Michael McMahon can probably explain why the net code is using select for
timed read/accept (I have a vague recollection of there being an issue with
poll due to the way that it is implemented on kqueue with the result that it
had to be changed to use select).

I think the test needs re-work. It looks to me that the it attempts to
connect to an Oracle internal site so that's not going to work everywhere.
In general we don't want the tests to be dependent on hosts that may or may
not exist (we had tests that used to this in the past but they caused a lot
of grief). It also looks like the test doesn't close the 1023 files that it
opens at the start and so I assume this test will always fail on Windows
when jtreg tries to clean-up.

-Alan.

Re: RFR 8021820: Number of opened files used in select() is limited to 1024 [macosx]

Reply via email to