[
https://issues.apache.org/jira/browse/DERBY-4319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13005332#comment-13005332
]
Kathey Marsden commented on DERBY-4319:
---------------------------------------
I have the machine back in the state where this reproduces and am sorry to say
that there is still a hang in a different method, even with my prior attempt to
get past it, but since I can reproduce now, I should be able to make some
progress on this issue. I'll record some info here in case it becomes hard
to reproduce again.
The current state of hang is that the launched network server process which
seems to specify all the drda parameters without values:
cloudtst 6488248 4390978 0 14:41:38 - 0:20 /local1/IBM_JDK/15sr13/sdk/jr
e/bin/java -classpath /local1/kmarsden/repro/derby-4319/jars//derby.jar:/local1/
kmarsden/repro/derby-4319/jars//derbyrun.jar:/local1/kmarsden/repro/derby-4319/j
ars//derbyTesting.jar:/local1/kmarsden/repro/derby-4319/jars//junit.jar -Dderby.
drda.logConnections= -Dderby.drda.traceAll= -Dderby.drda.traceDirectory= -Dderby
.drda.keepAlive= -Dderby.drda.timeSlice= -Dderby.drda.host= -Dderby.drda.portNum
ber= -Dderby.drda.minThreads= -Dderby.drda.maxThreads= -Dderby.drda.startNetwork
Server= -Dderby.drda.debug= org.apache.derby.drda.NetworkServerControl start -h
localhost -p 1527
I will attach the javacore with thread dump as
LaunchedNetworkServer.javacore.20110309.160148.6488248.0001.txt
The server threads look pretty normal with a ClientThread running waiting to
accept requests.
The test process is hung in NetworkServerTestSetup.complete(). I am not sure if
it is later or if the change I made just did not work. I will attach the test
process file as:
TestProcess.javacore.20110310.123703.4390978.0001.txt
If I try to ping the server from the command line I get a ConnectionReset error:
$ java org.apache.derby.drda.NetworkServerControl ping
Thu Mar 10 12:47:39 PST 2011 : Error on client socket:
Connection reset
Thu Mar 10 12:47:39 PST 2011 : Connection reset
java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:197)
at java.net.SocketInputStream.read(SocketInputStream.java:116)
at org.apache.derby.impl.drda.NetworkServerControlImpl.fillReplyBuffer(N
etworkServerControlImpl.java:2873)
at org.apache.derby.impl.drda.NetworkServerControlImpl.readResult(Networ
kServerControlImpl.java:2817)
at org.apache.derby.impl.drda.NetworkServerControlImpl.pingWithNoOpen(Ne
tworkServerControlImpl.java:1253)
at org.apache.derby.impl.drda.NetworkServerControlImpl.ping(NetworkServe
rControlImpl.java:1228)
at org.apache.derby.impl.drda.NetworkServerControlImpl.executeWork(Netwo
rkServerControlImpl.java:2260)
at org.apache.derby.drda.NetworkServerControl.main(NetworkServerControl.
java:320)
Then after that subsequent ping attempts hang and a new thread dump on the
Network Server process shows that the ClientThread is no longer there. I
think this should never happen. I think a lot of work has been put into making
sure that the ClientThread always survives any type of error in order host more
connections. see attachment
LaunchedNetworkServerAfterPing.javacore.20110310.124948.6488248.0002.txt
Another thing to note is that prior to the defaultProperties test there was
actually a stack trace in the setPortPriorty test with a Connection reset
which did not cause failure. see TestOutput2011-03-09.txt .out
This issue actually has many facets that are worth working on:
1) How do we make sure a spawned network server process is destroyed if it
hangs the whole suite?
2) Under what circumstances can the Network Server ClientThread that loops
accepting new connections be destroyed?
3) What sort of problem is being caused on AIX by starting network server with
these odd options? I am thinking maybe it is related to soTimeout or keepalive
getting set to an unexpected option but am not sure.
I have been holding off on working on 3, because it provides a good
reproduction for issue one and two but think that at this point, the best thing
to do would be to disable the problematic fixture on AIX whether it is
testSetpPortPriority or testDefaultProperties. Then I can work on all three
issues in a logical order and pace without release concerns. I'll look into
doing that.
> hang in suites.all with ibm 1.5 on AIX after ttestDefaultProperties
> -------------------------------------------------------------------
>
> Key: DERBY-4319
> URL: https://issues.apache.org/jira/browse/DERBY-4319
> Project: Derby
> Issue Type: Bug
> Components: Network Client
> Affects Versions: 10.5.2.0
> Environment: ibm jvm 1.5 SR9-0 on IBM AIX 3.5
> Reporter: Myrna van Lunteren
> Assignee: Kathey Marsden
> Labels: derby_triage10_8
> Attachments: derby-4317_timeout_for_complete_diff.txt,
> derby-4319_teardown_kill_on_bad_ping.txt,
> javacore.20090723.093837.25380.0001.txt,
> javacore.20090723.093909.24726.0001.txt
>
>
> The test run for 10.5.2.0 hung in suites.All. The console output (the run was
> with -Dderby.tests.trace=true) showed ttestDefaultProperties had successfully
> completed but the run was halted.
> ps -eaf | grep java showed the process that kicked off suites.All, and a
> networkserver process with the following flags:
> - classpath <classpath including derby.jar, derbytools.jar, derbyclient.jar,
> derbynet.jar, derbyTesting.jar, derbyrun.jar, derbyTesting.jar and junit.jar>
> -Dderby.drda.logConnections= -Dderby.drda.traceAll=
> -Dderby.drda.traceDirectory= -Dderby.drda.keepAlive= -Dderby.drda.timeSlice=
> -Dderby.drda.host= -Dderby.drda.portNumber= -derby.drda.minThreads=
> -Dderby.drda.maxThreads= -Dderby.drda.startNetworkServer= -Dderby.drda.debug=
> org.apache.derby.drda.NetworkServerControl start -h localhost -p 1527
> This process had been sitting for 2 days.
> After killing the NetworkServerControl process, the test continued
> successfully (except for DERBY-4186, fixed in trunk), but the following was
> put out to the console:
> START-SPAWNED:SpawnedNetworkServer STANDARD OUTPUT: exit code=137
> 2009-07-18 03:16:07.157 GMT : Security manager installed using the Basic
> server
> security policy.
> 2009-07-18 03:16:09.169 GMT : Apache Derby Network Server - 10.5.2.0 -
> (794445)
> started and ready to accept connections on port 1527
> END-SPAWNED :SpawnedNetworkServer STANDARD OUTPUT:
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira